[85] | 1 | ------------------------------------------------ |
---|
| 2 | PyPAR - Parallel Python, no-frills MPI interface |
---|
| 3 | |
---|
| 4 | Author: Ole Nielsen (2001, 2002, 2003) |
---|
| 5 | Email: Ole.Nielsen@anu.edu.au |
---|
| 6 | Version: See pypar.__version__ |
---|
| 7 | Date: See pypar.__date__ |
---|
| 8 | ------------------------------------------------ |
---|
| 9 | |
---|
| 10 | CONTENTS: |
---|
| 11 | TESTING THE INSTALLATION |
---|
| 12 | GETTING STARTED |
---|
| 13 | FIRST PARALLEL PROGRAM |
---|
| 14 | PROGRAMMING FOR EFFICIENCY |
---|
| 15 | PYPAR REFERENCE |
---|
| 16 | DATA TYPES |
---|
| 17 | STATUS OBJECT |
---|
| 18 | EXTENSIONS |
---|
| 19 | PROBLEMS |
---|
| 20 | REFERENCES |
---|
| 21 | |
---|
| 22 | TESTING THE INSTALLATION |
---|
| 23 | |
---|
| 24 | After having installed pypar run |
---|
| 25 | |
---|
| 26 | mpirun -np 2 testpypar.py |
---|
| 27 | |
---|
| 28 | to verify that everything works. |
---|
| 29 | If it doesn't try to run a C/MPI program |
---|
| 30 | (for example ctiming.c) to see if the problem lies with |
---|
| 31 | the local MPI installation or with pypar. |
---|
| 32 | |
---|
| 33 | |
---|
| 34 | GETTING STARTED |
---|
| 35 | Assuming that the pypar C-extension compiled you should be able to |
---|
| 36 | write |
---|
| 37 | >> import pypar |
---|
| 38 | from the Python prompt. |
---|
| 39 | (This may not work under MPICH under Sunos - this doesn't matter, |
---|
| 40 | just skip this interactive bit and go to FIRST PARALLEL PROGRAM.) |
---|
| 41 | |
---|
| 42 | Then you can write |
---|
| 43 | >> pypar.size() |
---|
| 44 | to get the number of parallel processors in play. |
---|
| 45 | (When using pypar from the command line this number will be 1). |
---|
| 46 | |
---|
| 47 | Processors are numbered from 0 to pypar.size()-1. |
---|
| 48 | Try |
---|
| 49 | >> pypar.rank() |
---|
| 50 | to get the processor number of the current processor. |
---|
| 51 | (When using pypar from the command line this number will be 0). |
---|
| 52 | |
---|
| 53 | Finally, try |
---|
| 54 | >> pypar.Get_processor_name() |
---|
| 55 | to return host name of current node |
---|
| 56 | |
---|
| 57 | FIRST PARALLEL PROGRAM |
---|
| 58 | Take a look at the file demo.py supplied with this distribution. |
---|
| 59 | To execute it in parallel on four processors, say, run |
---|
| 60 | |
---|
| 61 | mpirun -np 4 demo.py |
---|
| 62 | |
---|
| 63 | |
---|
| 64 | from the UNIX command line. |
---|
| 65 | This will start four copies of the program each on its own processor. |
---|
| 66 | |
---|
| 67 | The program demo.py makes a message on processor 0 and |
---|
| 68 | sends it around in a ring |
---|
| 69 | - each processor adding a bit to it - until it arrives back at |
---|
| 70 | processor 0. |
---|
| 71 | |
---|
| 72 | All parallel programs must find out what processor they are running on. |
---|
| 73 | This is accomplished by the call |
---|
| 74 | |
---|
| 75 | myid = pypar.rank() |
---|
| 76 | |
---|
| 77 | The total number of processors is obtained from |
---|
| 78 | |
---|
| 79 | proc = pypar.size() |
---|
| 80 | |
---|
| 81 | One can then have different codes for different processors by branching |
---|
| 82 | as in |
---|
| 83 | |
---|
| 84 | if myid == 0 |
---|
| 85 | ... |
---|
| 86 | |
---|
| 87 | To send a general Python structure A to processor p, write |
---|
| 88 | pypar.send(A, p) |
---|
| 89 | and to receive something from processor q, write |
---|
| 90 | X = pypar.receive(q) |
---|
| 91 | |
---|
| 92 | This will cater for any (picklable Python structure) and make parallel |
---|
| 93 | programs very simple and readable. While reasonably fast, it does not |
---|
| 94 | achieve the full bandwidth obtainable by MPI. |
---|
| 95 | |
---|
| 96 | |
---|
| 97 | PROGRAMMING FOR EFFICIENCY |
---|
[123] | 98 | Really low latency communication can be achieved by sticking |
---|
| 99 | to Numeric arrays and specifying receive buffers whenever possible. |
---|
| 100 | |
---|
[85] | 101 | To send a Numeric array A to processor p, write |
---|
[123] | 102 | pypar.raw_send(A, p, use_buffer=True) |
---|
[85] | 103 | and to receive the array from processor q, write |
---|
[123] | 104 | X = pypar.receive(q, buffer=X) |
---|
[85] | 105 | Note that X acts as a buffer and must be pre-allocated prior to the |
---|
| 106 | receive statement as in Fortran and C programs using MPI. |
---|
| 107 | |
---|
[123] | 108 | These forms have superseded the raw forms present in pypar |
---|
| 109 | prior to version 1.9. The raw forms have been recast in terms of the |
---|
| 110 | above and have been retained for backwars compatibility. |
---|
[85] | 111 | See the script pytiming for an example of communication of Numeric arrays. |
---|
| 112 | |
---|
| 113 | |
---|
| 114 | MORE PARALLEL PROGRAMMING WITH PYPAR |
---|
| 115 | |
---|
| 116 | When you are ready to move on, have a look at the supplied |
---|
| 117 | demos and the script testpypar.py which all give examples |
---|
| 118 | of parallel programming with pypar. |
---|
| 119 | You can also look at a standard MPI reference (C or Fortran) |
---|
| 120 | to get some ideas. The principles are the same in pypar except |
---|
| 121 | that many calls have been simplified. |
---|
| 122 | |
---|
| 123 | |
---|
| 124 | |
---|
| 125 | PYPAR REFERENCE |
---|
| 126 | Here is a list of functions provided by pypar: |
---|
| 127 | (See section on Data types for explanation of 'vanilla'). |
---|
| 128 | |
---|
[123] | 129 | Identification: |
---|
| 130 | --------------- |
---|
| 131 | |
---|
[85] | 132 | size() -- Number of processors |
---|
| 133 | rank() -- Id of current processor |
---|
[123] | 134 | get_processor_name() -- Return host name of current node |
---|
[85] | 135 | |
---|
[123] | 136 | Basic send forms: |
---|
| 137 | -------------------- |
---|
| 138 | send(x, destination) |
---|
| 139 | Sends data x of any type to destination with default tag. |
---|
| 140 | |
---|
| 141 | send(x, destination, tag=t) |
---|
| 142 | Sends data x of any type to destination with tag t. |
---|
[85] | 143 | |
---|
[123] | 144 | send(x, destination, use_buffer=True) |
---|
| 145 | Sends data x of any type to destination |
---|
| 146 | assuming that recipient will specify a suitable buffer. |
---|
| 147 | |
---|
| 148 | send(x, destination, bypass=True) |
---|
| 149 | Send Numeric array of any type to recipient assuming |
---|
| 150 | that a suitable buffer has been specified and that |
---|
| 151 | recipient also specifies bypass=True |
---|
| 152 | |
---|
| 153 | |
---|
| 154 | Basic receive forms: |
---|
| 155 | -------------------- |
---|
| 156 | y=receive(source) |
---|
| 157 | receives data y of any type from source with default tag. |
---|
| 158 | |
---|
| 159 | y=receive(source, tag=t) |
---|
| 160 | receives data y of any type from source with tag t. |
---|
[85] | 161 | |
---|
[123] | 162 | y,status=receive(source, return_status=True) |
---|
| 163 | receives data y and status object from source |
---|
[85] | 164 | |
---|
[123] | 165 | y=receive(source, buffer=x) |
---|
| 166 | receives data y from source and puts |
---|
[85] | 167 | it in x (which must be of compatible size and type). |
---|
| 168 | It also returns a reference to x. |
---|
[123] | 169 | (Although it will accept all types this form is thought to be used |
---|
| 170 | mainly for Numeric arrays). |
---|
[85] | 171 | |
---|
[123] | 172 | Collective Communication: |
---|
| 173 | ------------------------- |
---|
[85] | 174 | |
---|
[123] | 175 | broadcast(x, root): |
---|
| 176 | Broadcasts x from root to all other processors. |
---|
| 177 | All processors must issue the same bcast. |
---|
[85] | 178 | |
---|
[123] | 179 | gather(x, root): |
---|
| 180 | Gather all elements in x to buffer of |
---|
| 181 | size len(x) * numprocs |
---|
| 182 | created by this function. |
---|
| 183 | If x is multidimensional buffer will have |
---|
| 184 | the size of zero'th dimension multiplied by numprocs. |
---|
| 185 | A reference to the created buffer is returned. |
---|
[85] | 186 | |
---|
[123] | 187 | gather(x, root, buffer=y): |
---|
| 188 | Gather all elements in x to specified buffer y |
---|
[85] | 189 | from source. |
---|
| 190 | Buffer must have size len(x) * numprocs and |
---|
[123] | 191 | shape[0] == x.shape[0]*numprocs. |
---|
| 192 | A reference to the buffer y is returned. |
---|
[85] | 193 | |
---|
[123] | 194 | scatter(x, root): |
---|
| 195 | Scatter all elements in x from root to all other processors |
---|
| 196 | in a buffer created by this function. |
---|
| 197 | A reference to the created buffer is returned. |
---|
[85] | 198 | |
---|
[123] | 199 | scatter(x, root, buffer=y): |
---|
| 200 | Scatter all elements in x from root to all other processors |
---|
| 201 | using specified buffer y. |
---|
| 202 | A reference to the buffer y is returned. |
---|
[85] | 203 | |
---|
[123] | 204 | reduce(x, op, root): |
---|
| 205 | Reduce all elements in x at root |
---|
| 206 | applying operation op elementwise and return result in |
---|
| 207 | buffer created by this function. |
---|
| 208 | A reference to the created buffer is returned. |
---|
| 209 | |
---|
| 210 | reduce(x, op, root, buffer=y): |
---|
| 211 | Reduce all elements in x to specified buffer y |
---|
| 212 | (of the same size as x) |
---|
[85] | 213 | at source applying operation op elementwise. |
---|
[123] | 214 | A reference to the buffer y is returned. |
---|
[85] | 215 | |
---|
| 216 | |
---|
[123] | 217 | Other functions: |
---|
| 218 | ---------------- |
---|
[85] | 219 | |
---|
[123] | 220 | time() -- MPI wall time |
---|
| 221 | barrier() -- Synchronisation point. Makes processors wait until all |
---|
[85] | 222 | processors have reached this point. |
---|
[123] | 223 | abort() -- Terminate all processes. |
---|
| 224 | finalize() -- Cleanup MPI. No parallelism can take place after this point. |
---|
| 225 | initialized() -- True if MPI has been initialised |
---|
[85] | 226 | |
---|
[123] | 227 | |
---|
[85] | 228 | See pypar.py for doc strings on individual functions. |
---|
| 229 | |
---|
[123] | 230 | |
---|
| 231 | |
---|
[85] | 232 | DATA TYPES |
---|
| 233 | Pypar automatically handles different data types differently |
---|
| 234 | There are three protocols: |
---|
[123] | 235 | 'array': Numeric arrays of type Int ('i', 'l'), Float ('f', 'd'), |
---|
| 236 | or Complex ('F', 'D') can be communicated |
---|
[85] | 237 | with the underlying mpiext.send_array and mpiext.receive_array. |
---|
| 238 | This is the fastest mode. |
---|
[123] | 239 | Note that even though the underlying C implementation does not |
---|
| 240 | support Complex as a native datatype, pypar handles them |
---|
| 241 | efficiently and seemlessly by transmitting them as arrays of |
---|
| 242 | floats of twice the size. |
---|
[85] | 243 | 'string': Text strings can be communicated with mpiext.send_string and |
---|
| 244 | mpiext.receive_string. |
---|
| 245 | 'vanilla': All other types can be communicated using the scripts |
---|
| 246 | send_vanilla and receive_vanilla provided that the objects |
---|
| 247 | can be serialised using |
---|
| 248 | pickle (or cPickle). The latter mode is less efficient than the |
---|
[123] | 249 | first two but it can handle general structures. |
---|
[85] | 250 | |
---|
| 251 | Rules: |
---|
| 252 | If keyword argument vanilla == 1, vanilla is chosen regardless of |
---|
| 253 | x's type. |
---|
| 254 | Otherwise if x is a string, the string protocol is chosen |
---|
| 255 | If x is an array, the 'array' protocol is chosen provided that x has one |
---|
| 256 | of the admissible typecodes. |
---|
| 257 | |
---|
| 258 | Function that take vanilla as a keyword argument can force vanilla mode |
---|
| 259 | on any datatype. |
---|
| 260 | |
---|
| 261 | STATUS OBJECT |
---|
| 262 | A status object can be optionally returned from receive and raw_receive by |
---|
| 263 | specifying return_status=True in the call. |
---|
| 264 | The status object can subsequently be queried for information about the communication. |
---|
| 265 | The fields are: |
---|
| 266 | status.source: The origin of the received message (use e.g. with pypar.any_source) |
---|
| 267 | status.tag: The tag of the received message (use e.g. with pypar.any_tag) |
---|
| 268 | status.error: Error code generated by underlying C function |
---|
| 269 | status.length: Number of elements received |
---|
| 270 | status.size: Size (in bytes) of one element |
---|
| 271 | status.bytes(): Number of payload bytes received (excl control info) |
---|
| 272 | |
---|
| 273 | The status object is essential when use together with any_source or any_tag. |
---|
| 274 | |
---|
| 275 | |
---|
| 276 | EXTENSIONS |
---|
| 277 | At this stage only a subset of MPI is implemented. However, |
---|
| 278 | most real-world MPI programs use only these simple functions. |
---|
| 279 | If you find that you need other MPI calls, please feel free to |
---|
| 280 | add them to the C-extension. Alternatively, drop me note and I'll |
---|
| 281 | use that as an excuse to update pypar. |
---|
| 282 | |
---|
| 283 | |
---|
| 284 | PERFORMANCE |
---|
| 285 | If you are passing simple Numeric arrays around you can reduce |
---|
[123] | 286 | the communication time by using the 'buffer' keyword arguments |
---|
| 287 | (see REFERENCE above). These version are closer to the underlying MPI |
---|
[85] | 288 | implementation in that one must provide receive buffers of the right size. |
---|
[123] | 289 | However, you will find that these version have lower latency and |
---|
| 290 | can be somewhat faster as they bypass |
---|
[85] | 291 | pypar's mechanism for automatically transferring the needed buffer size. |
---|
[123] | 292 | Also, using simple numeric arrays will bypass pypar's pickling of general |
---|
[85] | 293 | structures. |
---|
| 294 | |
---|
| 295 | This will mainly be of interest if you are using a 'fine grained' |
---|
| 296 | parallelism, i.e. if you have frequent communications. |
---|
| 297 | |
---|
| 298 | Try both versions and see for yourself if there is any noticable difference. |
---|
| 299 | |
---|
| 300 | |
---|
| 301 | PROBLEMS |
---|
| 302 | If you encounter any problems with this package please drop me a line. |
---|
| 303 | I cannot guarantee that I can help you, but I will try. |
---|
| 304 | |
---|
| 305 | Ole Nielsen |
---|
| 306 | Australian National University |
---|
| 307 | Email: Ole.Nielsen@anu.edu.au |
---|
| 308 | |
---|
| 309 | |
---|
| 310 | REFERENCES |
---|
| 311 | To learn more about MPI in general, see WEB sites |
---|
| 312 | http://www-unix.mcs.anl.gov/mpi/ |
---|
| 313 | http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg245380.pdf |
---|
| 314 | |
---|
| 315 | or books |
---|
| 316 | "Using MPI, 2nd Edition", by Gropp, Lusk, and Skjellum, |
---|
| 317 | "The LAM companion to 'Using MPI...'" by Zdzislaw Meglicki |
---|
| 318 | "Parallel Programming With MPI", by Peter S. Pacheco |
---|
| 319 | "RS/6000 SP: Practical MPI Programming", by Yukiya Aoyama and Jun Nakano |
---|
| 320 | |
---|
| 321 | To learn more about Python, see the WEB site |
---|
| 322 | http://www.python.org |
---|
| 323 | |
---|
| 324 | |
---|
| 325 | |
---|
| 326 | |
---|
| 327 | |
---|
| 328 | |
---|
| 329 | |
---|
| 330 | |
---|