------------------------------------------------ PyPAR - Parallel Python, no-frills MPI interface Author: Ole Nielsen (2001, 2002, 2003) Email: Ole.Nielsen@anu.edu.au Version: See pypar.__version__ Date: See pypar.__date__ ------------------------------------------------ CONTENTS: TESTING THE INSTALLATION GETTING STARTED FIRST PARALLEL PROGRAM PROGRAMMING FOR EFFICIENCY PYPAR REFERENCE DATA TYPES STATUS OBJECT EXTENSIONS PROBLEMS REFERENCES TESTING THE INSTALLATION After having installed pypar run mpirun -np 2 testpypar.py to verify that everything works. If it doesn't try to run a C/MPI program (for example ctiming.c) to see if the problem lies with the local MPI installation or with pypar. GETTING STARTED Assuming that the pypar C-extension compiled you should be able to write >> import pypar from the Python prompt. (This may not work under MPICH under Sunos - this doesn't matter, just skip this interactive bit and go to FIRST PARALLEL PROGRAM.) Then you can write >> pypar.size() to get the number of parallel processors in play. (When using pypar from the command line this number will be 1). Processors are numbered from 0 to pypar.size()-1. Try >> pypar.rank() to get the processor number of the current processor. (When using pypar from the command line this number will be 0). Finally, try >> pypar.Get_processor_name() to return host name of current node FIRST PARALLEL PROGRAM Take a look at the file demo.py supplied with this distribution. To execute it in parallel on four processors, say, run mpirun -np 4 demo.py from the UNIX command line. This will start four copies of the program each on its own processor. The program demo.py makes a message on processor 0 and sends it around in a ring - each processor adding a bit to it - until it arrives back at processor 0. All parallel programs must find out what processor they are running on. This is accomplished by the call myid = pypar.rank() The total number of processors is obtained from proc = pypar.size() One can then have different codes for different processors by branching as in if myid == 0 ... To send a general Python structure A to processor p, write pypar.send(A, p) and to receive something from processor q, write X = pypar.receive(q) This will cater for any (picklable Python structure) and make parallel programs very simple and readable. While reasonably fast, it does not achieve the full bandwidth obtainable by MPI. PROGRAMMING FOR EFFICIENCY Really low latency communication can be achieved by sticking to Numeric arrays and specifying receive buffers whenever possible. To send a Numeric array A to processor p, write pypar.raw_send(A, p, use_buffer=True) and to receive the array from processor q, write X = pypar.receive(q, buffer=X) Note that X acts as a buffer and must be pre-allocated prior to the receive statement as in Fortran and C programs using MPI. These forms have superseded the raw forms present in pypar prior to version 1.9. The raw forms have been recast in terms of the above and have been retained for backwars compatibility. See the script pytiming for an example of communication of Numeric arrays. MORE PARALLEL PROGRAMMING WITH PYPAR When you are ready to move on, have a look at the supplied demos and the script testpypar.py which all give examples of parallel programming with pypar. You can also look at a standard MPI reference (C or Fortran) to get some ideas. The principles are the same in pypar except that many calls have been simplified. PYPAR REFERENCE Here is a list of functions provided by pypar: (See section on Data types for explanation of 'vanilla'). Identification: --------------- size() -- Number of processors rank() -- Id of current processor get_processor_name() -- Return host name of current node Basic send forms: -------------------- send(x, destination) Sends data x of any type to destination with default tag. send(x, destination, tag=t) Sends data x of any type to destination with tag t. send(x, destination, use_buffer=True) Sends data x of any type to destination assuming that recipient will specify a suitable buffer. send(x, destination, bypass=True) Send Numeric array of any type to recipient assuming that a suitable buffer has been specified and that recipient also specifies bypass=True Basic receive forms: -------------------- y=receive(source) receives data y of any type from source with default tag. y=receive(source, tag=t) receives data y of any type from source with tag t. y,status=receive(source, return_status=True) receives data y and status object from source y=receive(source, buffer=x) receives data y from source and puts it in x (which must be of compatible size and type). It also returns a reference to x. (Although it will accept all types this form is thought to be used mainly for Numeric arrays). Collective Communication: ------------------------- broadcast(x, root): Broadcasts x from root to all other processors. All processors must issue the same bcast. gather(x, root): Gather all elements in x to buffer of size len(x) * numprocs created by this function. If x is multidimensional buffer will have the size of zero'th dimension multiplied by numprocs. A reference to the created buffer is returned. gather(x, root, buffer=y): Gather all elements in x to specified buffer y from source. Buffer must have size len(x) * numprocs and shape[0] == x.shape[0]*numprocs. A reference to the buffer y is returned. scatter(x, root): Scatter all elements in x from root to all other processors in a buffer created by this function. A reference to the created buffer is returned. scatter(x, root, buffer=y): Scatter all elements in x from root to all other processors using specified buffer y. A reference to the buffer y is returned. reduce(x, op, root): Reduce all elements in x at root applying operation op elementwise and return result in buffer created by this function. A reference to the created buffer is returned. reduce(x, op, root, buffer=y): Reduce all elements in x to specified buffer y (of the same size as x) at source applying operation op elementwise. A reference to the buffer y is returned. Other functions: ---------------- time() -- MPI wall time barrier() -- Synchronisation point. Makes processors wait until all processors have reached this point. abort() -- Terminate all processes. finalize() -- Cleanup MPI. No parallelism can take place after this point. initialized() -- True if MPI has been initialised See pypar.py for doc strings on individual functions. DATA TYPES Pypar automatically handles different data types differently There are three protocols: 'array': Numeric arrays of type Int ('i', 'l'), Float ('f', 'd'), or Complex ('F', 'D') can be communicated with the underlying mpiext.send_array and mpiext.receive_array. This is the fastest mode. Note that even though the underlying C implementation does not support Complex as a native datatype, pypar handles them efficiently and seemlessly by transmitting them as arrays of floats of twice the size. 'string': Text strings can be communicated with mpiext.send_string and mpiext.receive_string. 'vanilla': All other types can be communicated using the scripts send_vanilla and receive_vanilla provided that the objects can be serialised using pickle (or cPickle). The latter mode is less efficient than the first two but it can handle general structures. Rules: If keyword argument vanilla == 1, vanilla is chosen regardless of x's type. Otherwise if x is a string, the string protocol is chosen If x is an array, the 'array' protocol is chosen provided that x has one of the admissible typecodes. Function that take vanilla as a keyword argument can force vanilla mode on any datatype. STATUS OBJECT A status object can be optionally returned from receive and raw_receive by specifying return_status=True in the call. The status object can subsequently be queried for information about the communication. The fields are: status.source: The origin of the received message (use e.g. with pypar.any_source) status.tag: The tag of the received message (use e.g. with pypar.any_tag) status.error: Error code generated by underlying C function status.length: Number of elements received status.size: Size (in bytes) of one element status.bytes(): Number of payload bytes received (excl control info) The status object is essential when use together with any_source or any_tag. EXTENSIONS At this stage only a subset of MPI is implemented. However, most real-world MPI programs use only these simple functions. If you find that you need other MPI calls, please feel free to add them to the C-extension. Alternatively, drop me note and I'll use that as an excuse to update pypar. PERFORMANCE If you are passing simple Numeric arrays around you can reduce the communication time by using the 'buffer' keyword arguments (see REFERENCE above). These version are closer to the underlying MPI implementation in that one must provide receive buffers of the right size. However, you will find that these version have lower latency and can be somewhat faster as they bypass pypar's mechanism for automatically transferring the needed buffer size. Also, using simple numeric arrays will bypass pypar's pickling of general structures. This will mainly be of interest if you are using a 'fine grained' parallelism, i.e. if you have frequent communications. Try both versions and see for yourself if there is any noticable difference. PROBLEMS If you encounter any problems with this package please drop me a line. I cannot guarantee that I can help you, but I will try. Ole Nielsen Australian National University Email: Ole.Nielsen@anu.edu.au REFERENCES To learn more about MPI in general, see WEB sites http://www-unix.mcs.anl.gov/mpi/ http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg245380.pdf or books "Using MPI, 2nd Edition", by Gropp, Lusk, and Skjellum, "The LAM companion to 'Using MPI...'" by Zdzislaw Meglicki "Parallel Programming With MPI", by Peter S. Pacheco "RS/6000 SP: Practical MPI Programming", by Yukiya Aoyama and Jun Nakano To learn more about Python, see the WEB site http://www.python.org