------------------------------------------------ PyPAR - Parallel Python, no-frills MPI interface Author: Ole Nielsen (2001, 2002, 2003) Email: Ole.Nielsen@anu.edu.au Version: See pypar.__version__ Date: See pypar.__date__ ------------------------------------------------ CONTENTS: TESTING THE INSTALLATION GETTING STARTED FIRST PARALLEL PROGRAM PROGRAMMING FOR EFFICIENCY PYPAR REFERENCE DATA TYPES STATUS OBJECT EXTENSIONS PROBLEMS REFERENCES TESTING THE INSTALLATION After having installed pypar run mpirun -np 2 testpypar.py to verify that everything works. If it doesn't try to run a C/MPI program (for example ctiming.c) to see if the problem lies with the local MPI installation or with pypar. GETTING STARTED Assuming that the pypar C-extension compiled you should be able to write >> import pypar from the Python prompt. (This may not work under MPICH under Sunos - this doesn't matter, just skip this interactive bit and go to FIRST PARALLEL PROGRAM.) Then you can write >> pypar.size() to get the number of parallel processors in play. (When using pypar from the command line this number will be 1). Processors are numbered from 0 to pypar.size()-1. Try >> pypar.rank() to get the processor number of the current processor. (When using pypar from the command line this number will be 0). Finally, try >> pypar.Get_processor_name() to return host name of current node FIRST PARALLEL PROGRAM Take a look at the file demo.py supplied with this distribution. To execute it in parallel on four processors, say, run mpirun -np 4 demo.py from the UNIX command line. This will start four copies of the program each on its own processor. The program demo.py makes a message on processor 0 and sends it around in a ring - each processor adding a bit to it - until it arrives back at processor 0. All parallel programs must find out what processor they are running on. This is accomplished by the call myid = pypar.rank() The total number of processors is obtained from proc = pypar.size() One can then have different codes for different processors by branching as in if myid == 0 ... To send a general Python structure A to processor p, write pypar.send(A, p) and to receive something from processor q, write X = pypar.receive(q) This will cater for any (picklable Python structure) and make parallel programs very simple and readable. While reasonably fast, it does not achieve the full bandwidth obtainable by MPI. PROGRAMMING FOR EFFICIENCY For really fast communication one must stick to Numeric arrays and use 'raw' versions of send and receive, e.g.: To send a Numeric array A to processor p, write pypar.raw_send(A, p) and to receive the array from processor q, write X = pypar.raw_receive(X, q) Note that X acts as a buffer and must be pre-allocated prior to the receive statement as in Fortran and C programs using MPI. See the script pytiming for an example of communication of Numeric arrays. MORE PARALLEL PROGRAMMING WITH PYPAR When you are ready to move on, have a look at the supplied demos and the script testpypar.py which all give examples of parallel programming with pypar. You can also look at a standard MPI reference (C or Fortran) to get some ideas. The principles are the same in pypar except that many calls have been simplified. PYPAR REFERENCE Here is a list of functions provided by pypar: (See section on Data types for explanation of 'vanilla'). size() -- Number of processors rank() -- Id of current processor Get_processor_name() -- Return host name of current node send(x, destination, tag=0, vanilla=0) -- Blocking send (all types) Sends data in x to destination with given tag. y=receive(source, tag=0) -- Blocking receive (all types) receives data (y) from source (possible with specified tag). y, status=receive(source, tag=0, return_status=True) -- Blocking receive (all types) receives data (y) and status object from source (possible with specified tag). raw_send(x, destination, tag=0, vanilla=0): -- Blocking send (Fastest) Sends data in x to destination with given tag. It differs from send in that the receiver MUST provide a buffer to store the received data. Although it will accept all types raw_send is thought to be used mainly for Numeric arrays. raw_receive(x, source, tag=0, vanilla=0): -- Raw blocking receive (Fastest) receives data from source (possible with specified tag) and puts it in x (which must be of compatible size and type). It also returns a reference to x. Although it will accept all types raw_send is thought to be used mainly for Numeric arrays. x, status = raw_receive(x, source, tag=0, vanilla=0, return_status=True): -- Raw blocking receive (Fastest) receives data and status object from source (possible with specified tag) and puts it in x (which must be of compatible size and type). bcast(X, rootid) -- Broadcasts X from rootid to all other processors. All processors must issue the same bcast. raw_scatter(x, nums, buffer, source, vanilla=0): Scatter the first nums elements in x to buffer (of size given by nums) from source. scatter(x, source, vanilla=0): Scatter all elements in x to a buffer created by this function and returned. raw_gather(x, buffer, source, vanilla=0): Gather all elements in x to buffer from source. Buffer must have size len(x) * numprocs and shape[0] == x.shape[0]*numprocs gather(x, source, vanilla=0): Gather all elements in x to buffer of size len(x) * numprocs created by this function and returned. If x is multidimensional buffer will have the size of zero'th dimension multiplied by numprocs raw_reduce(x, buffer, op, source, vanilla=0): Reduce all elements in x to buffer (of the same size as x) at source applying operation op elementwise. reduce(x, op, source, vanilla=0): Reduce all elements in x at source applying operation op elementwise and return result in new buffer. Buffer is created and returned. Wtime() -- MPI wall time Barrier() -- Synchronisation point. Makes processors wait until all processors have reached this point. Abort() -- Terminate all processes. Finalize() -- Cleanup MPI. No parallelism can take place after this point. See pypar.py for doc strings on individual functions. DATA TYPES Pypar automatically handles different data types differently There are three protocols: 'array': Numeric arrays of type 'i', 'l', 'f', or 'd' can be communicated with the underlying mpiext.send_array and mpiext.receive_array. This is the fastest mode. 'string': Text strings can be communicated with mpiext.send_string and mpiext.receive_string. 'vanilla': All other types can be communicated using the scripts send_vanilla and receive_vanilla provided that the objects can be serialised using pickle (or cPickle). The latter mode is less efficient than the first two but it can handle complex structures. Rules: If keyword argument vanilla == 1, vanilla is chosen regardless of x's type. Otherwise if x is a string, the string protocol is chosen If x is an array, the 'array' protocol is chosen provided that x has one of the admissible typecodes. Function that take vanilla as a keyword argument can force vanilla mode on any datatype. STATUS OBJECT A status object can be optionally returned from receive and raw_receive by specifying return_status=True in the call. The status object can subsequently be queried for information about the communication. The fields are: status.source: The origin of the received message (use e.g. with pypar.any_source) status.tag: The tag of the received message (use e.g. with pypar.any_tag) status.error: Error code generated by underlying C function status.length: Number of elements received status.size: Size (in bytes) of one element status.bytes(): Number of payload bytes received (excl control info) The status object is essential when use together with any_source or any_tag. EXTENSIONS At this stage only a subset of MPI is implemented. However, most real-world MPI programs use only these simple functions. If you find that you need other MPI calls, please feel free to add them to the C-extension. Alternatively, drop me note and I'll use that as an excuse to update pypar. PERFORMANCE If you are passing simple Numeric arrays around you can reduce the communication time by using the '_raw' versions of send and receive (see REFERENCE above). These version are closer to the underlying MPI implementation in that one must provide receive buffers of the right size. However, you will find that this can be somewhat faster as they bypass pypar's mechanism for automatically transferring the needed buffer size. Also, using simple numeric arrays will bypass pypar's pickling of complex structures. This will mainly be of interest if you are using a 'fine grained' parallelism, i.e. if you have frequent communications. Try both versions and see for yourself if there is any noticable difference. PROBLEMS If you encounter any problems with this package please drop me a line. I cannot guarantee that I can help you, but I will try. Ole Nielsen Australian National University Email: Ole.Nielsen@anu.edu.au REFERENCES To learn more about MPI in general, see WEB sites http://www-unix.mcs.anl.gov/mpi/ http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg245380.pdf or books "Using MPI, 2nd Edition", by Gropp, Lusk, and Skjellum, "The LAM companion to 'Using MPI...'" by Zdzislaw Meglicki "Parallel Programming With MPI", by Peter S. Pacheco "RS/6000 SP: Practical MPI Programming", by Yukiya Aoyama and Jun Nakano To learn more about Python, see the WEB site http://www.python.org