1 | ------------------------------------------------ |
---|
2 | PyPAR - Parallel Python, no-frills MPI interface |
---|
3 | |
---|
4 | Author: Ole Nielsen (2001, 2002, 2003) |
---|
5 | Email: Ole.Nielsen@anu.edu.au |
---|
6 | Version: See pypar.__version__ |
---|
7 | Date: See pypar.__date__ |
---|
8 | ------------------------------------------------ |
---|
9 | |
---|
10 | CONTENTS: |
---|
11 | TESTING THE INSTALLATION |
---|
12 | GETTING STARTED |
---|
13 | FIRST PARALLEL PROGRAM |
---|
14 | PROGRAMMING FOR EFFICIENCY |
---|
15 | PYPAR REFERENCE |
---|
16 | DATA TYPES |
---|
17 | STATUS OBJECT |
---|
18 | EXTENSIONS |
---|
19 | PROBLEMS |
---|
20 | REFERENCES |
---|
21 | |
---|
22 | TESTING THE INSTALLATION |
---|
23 | |
---|
24 | After having installed pypar run |
---|
25 | |
---|
26 | mpirun -np 2 testpypar.py |
---|
27 | |
---|
28 | to verify that everything works. |
---|
29 | If it doesn't try to run a C/MPI program |
---|
30 | (for example ctiming.c) to see if the problem lies with |
---|
31 | the local MPI installation or with pypar. |
---|
32 | |
---|
33 | |
---|
34 | GETTING STARTED |
---|
35 | Assuming that the pypar C-extension compiled you should be able to |
---|
36 | write |
---|
37 | >> import pypar |
---|
38 | from the Python prompt. |
---|
39 | (This may not work under MPICH under Sunos - this doesn't matter, |
---|
40 | just skip this interactive bit and go to FIRST PARALLEL PROGRAM.) |
---|
41 | |
---|
42 | Then you can write |
---|
43 | >> pypar.size() |
---|
44 | to get the number of parallel processors in play. |
---|
45 | (When using pypar from the command line this number will be 1). |
---|
46 | |
---|
47 | Processors are numbered from 0 to pypar.size()-1. |
---|
48 | Try |
---|
49 | >> pypar.rank() |
---|
50 | to get the processor number of the current processor. |
---|
51 | (When using pypar from the command line this number will be 0). |
---|
52 | |
---|
53 | Finally, try |
---|
54 | >> pypar.Get_processor_name() |
---|
55 | to return host name of current node |
---|
56 | |
---|
57 | FIRST PARALLEL PROGRAM |
---|
58 | Take a look at the file demo.py supplied with this distribution. |
---|
59 | To execute it in parallel on four processors, say, run |
---|
60 | |
---|
61 | mpirun -np 4 demo.py |
---|
62 | |
---|
63 | |
---|
64 | from the UNIX command line. |
---|
65 | This will start four copies of the program each on its own processor. |
---|
66 | |
---|
67 | The program demo.py makes a message on processor 0 and |
---|
68 | sends it around in a ring |
---|
69 | - each processor adding a bit to it - until it arrives back at |
---|
70 | processor 0. |
---|
71 | |
---|
72 | All parallel programs must find out what processor they are running on. |
---|
73 | This is accomplished by the call |
---|
74 | |
---|
75 | myid = pypar.rank() |
---|
76 | |
---|
77 | The total number of processors is obtained from |
---|
78 | |
---|
79 | proc = pypar.size() |
---|
80 | |
---|
81 | One can then have different codes for different processors by branching |
---|
82 | as in |
---|
83 | |
---|
84 | if myid == 0 |
---|
85 | ... |
---|
86 | |
---|
87 | To send a general Python structure A to processor p, write |
---|
88 | pypar.send(A, p) |
---|
89 | and to receive something from processor q, write |
---|
90 | X = pypar.receive(q) |
---|
91 | |
---|
92 | This will cater for any (picklable Python structure) and make parallel |
---|
93 | programs very simple and readable. While reasonably fast, it does not |
---|
94 | achieve the full bandwidth obtainable by MPI. |
---|
95 | |
---|
96 | |
---|
97 | PROGRAMMING FOR EFFICIENCY |
---|
98 | Really low latency communication can be achieved by sticking |
---|
99 | to Numeric arrays and specifying receive buffers whenever possible. |
---|
100 | |
---|
101 | To send a Numeric array A to processor p, write |
---|
102 | pypar.raw_send(A, p, use_buffer=True) |
---|
103 | and to receive the array from processor q, write |
---|
104 | X = pypar.receive(q, buffer=X) |
---|
105 | Note that X acts as a buffer and must be pre-allocated prior to the |
---|
106 | receive statement as in Fortran and C programs using MPI. |
---|
107 | |
---|
108 | These forms have superseded the raw forms present in pypar |
---|
109 | prior to version 1.9. The raw forms have been recast in terms of the |
---|
110 | above and have been retained for backwars compatibility. |
---|
111 | See the script pytiming for an example of communication of Numeric arrays. |
---|
112 | |
---|
113 | |
---|
114 | MORE PARALLEL PROGRAMMING WITH PYPAR |
---|
115 | |
---|
116 | When you are ready to move on, have a look at the supplied |
---|
117 | demos and the script testpypar.py which all give examples |
---|
118 | of parallel programming with pypar. |
---|
119 | You can also look at a standard MPI reference (C or Fortran) |
---|
120 | to get some ideas. The principles are the same in pypar except |
---|
121 | that many calls have been simplified. |
---|
122 | |
---|
123 | |
---|
124 | |
---|
125 | PYPAR REFERENCE |
---|
126 | Here is a list of functions provided by pypar: |
---|
127 | (See section on Data types for explanation of 'vanilla'). |
---|
128 | |
---|
129 | Identification: |
---|
130 | --------------- |
---|
131 | |
---|
132 | size() -- Number of processors |
---|
133 | rank() -- Id of current processor |
---|
134 | get_processor_name() -- Return host name of current node |
---|
135 | |
---|
136 | Basic send forms: |
---|
137 | -------------------- |
---|
138 | send(x, destination) |
---|
139 | Sends data x of any type to destination with default tag. |
---|
140 | |
---|
141 | send(x, destination, tag=t) |
---|
142 | Sends data x of any type to destination with tag t. |
---|
143 | |
---|
144 | send(x, destination, use_buffer=True) |
---|
145 | Sends data x of any type to destination |
---|
146 | assuming that recipient will specify a suitable buffer. |
---|
147 | |
---|
148 | send(x, destination, bypass=True) |
---|
149 | Send Numeric array of any type to recipient assuming |
---|
150 | that a suitable buffer has been specified and that |
---|
151 | recipient also specifies bypass=True |
---|
152 | |
---|
153 | |
---|
154 | Basic receive forms: |
---|
155 | -------------------- |
---|
156 | y=receive(source) |
---|
157 | receives data y of any type from source with default tag. |
---|
158 | |
---|
159 | y=receive(source, tag=t) |
---|
160 | receives data y of any type from source with tag t. |
---|
161 | |
---|
162 | y,status=receive(source, return_status=True) |
---|
163 | receives data y and status object from source |
---|
164 | |
---|
165 | y=receive(source, buffer=x) |
---|
166 | receives data y from source and puts |
---|
167 | it in x (which must be of compatible size and type). |
---|
168 | It also returns a reference to x. |
---|
169 | (Although it will accept all types this form is thought to be used |
---|
170 | mainly for Numeric arrays). |
---|
171 | |
---|
172 | Collective Communication: |
---|
173 | ------------------------- |
---|
174 | |
---|
175 | broadcast(x, root): |
---|
176 | Broadcasts x from root to all other processors. |
---|
177 | All processors must issue the same bcast. |
---|
178 | |
---|
179 | gather(x, root): |
---|
180 | Gather all elements in x to buffer of |
---|
181 | size len(x) * numprocs |
---|
182 | created by this function. |
---|
183 | If x is multidimensional buffer will have |
---|
184 | the size of zero'th dimension multiplied by numprocs. |
---|
185 | A reference to the created buffer is returned. |
---|
186 | |
---|
187 | gather(x, root, buffer=y): |
---|
188 | Gather all elements in x to specified buffer y |
---|
189 | from source. |
---|
190 | Buffer must have size len(x) * numprocs and |
---|
191 | shape[0] == x.shape[0]*numprocs. |
---|
192 | A reference to the buffer y is returned. |
---|
193 | |
---|
194 | scatter(x, root): |
---|
195 | Scatter all elements in x from root to all other processors |
---|
196 | in a buffer created by this function. |
---|
197 | A reference to the created buffer is returned. |
---|
198 | |
---|
199 | scatter(x, root, buffer=y): |
---|
200 | Scatter all elements in x from root to all other processors |
---|
201 | using specified buffer y. |
---|
202 | A reference to the buffer y is returned. |
---|
203 | |
---|
204 | reduce(x, op, root): |
---|
205 | Reduce all elements in x at root |
---|
206 | applying operation op elementwise and return result in |
---|
207 | buffer created by this function. |
---|
208 | A reference to the created buffer is returned. |
---|
209 | |
---|
210 | reduce(x, op, root, buffer=y): |
---|
211 | Reduce all elements in x to specified buffer y |
---|
212 | (of the same size as x) |
---|
213 | at source applying operation op elementwise. |
---|
214 | A reference to the buffer y is returned. |
---|
215 | |
---|
216 | |
---|
217 | Other functions: |
---|
218 | ---------------- |
---|
219 | |
---|
220 | time() -- MPI wall time |
---|
221 | barrier() -- Synchronisation point. Makes processors wait until all |
---|
222 | processors have reached this point. |
---|
223 | abort() -- Terminate all processes. |
---|
224 | finalize() -- Cleanup MPI. No parallelism can take place after this point. |
---|
225 | initialized() -- True if MPI has been initialised |
---|
226 | |
---|
227 | |
---|
228 | See pypar.py for doc strings on individual functions. |
---|
229 | |
---|
230 | |
---|
231 | |
---|
232 | DATA TYPES |
---|
233 | Pypar automatically handles different data types differently |
---|
234 | There are three protocols: |
---|
235 | 'array': Numeric arrays of type Int ('i', 'l'), Float ('f', 'd'), |
---|
236 | or Complex ('F', 'D') can be communicated |
---|
237 | with the underlying mpiext.send_array and mpiext.receive_array. |
---|
238 | This is the fastest mode. |
---|
239 | Note that even though the underlying C implementation does not |
---|
240 | support Complex as a native datatype, pypar handles them |
---|
241 | efficiently and seemlessly by transmitting them as arrays of |
---|
242 | floats of twice the size. |
---|
243 | 'string': Text strings can be communicated with mpiext.send_string and |
---|
244 | mpiext.receive_string. |
---|
245 | 'vanilla': All other types can be communicated using the scripts |
---|
246 | send_vanilla and receive_vanilla provided that the objects |
---|
247 | can be serialised using |
---|
248 | pickle (or cPickle). The latter mode is less efficient than the |
---|
249 | first two but it can handle general structures. |
---|
250 | |
---|
251 | Rules: |
---|
252 | If keyword argument vanilla == 1, vanilla is chosen regardless of |
---|
253 | x's type. |
---|
254 | Otherwise if x is a string, the string protocol is chosen |
---|
255 | If x is an array, the 'array' protocol is chosen provided that x has one |
---|
256 | of the admissible typecodes. |
---|
257 | |
---|
258 | Function that take vanilla as a keyword argument can force vanilla mode |
---|
259 | on any datatype. |
---|
260 | |
---|
261 | STATUS OBJECT |
---|
262 | A status object can be optionally returned from receive and raw_receive by |
---|
263 | specifying return_status=True in the call. |
---|
264 | The status object can subsequently be queried for information about the communication. |
---|
265 | The fields are: |
---|
266 | status.source: The origin of the received message (use e.g. with pypar.any_source) |
---|
267 | status.tag: The tag of the received message (use e.g. with pypar.any_tag) |
---|
268 | status.error: Error code generated by underlying C function |
---|
269 | status.length: Number of elements received |
---|
270 | status.size: Size (in bytes) of one element |
---|
271 | status.bytes(): Number of payload bytes received (excl control info) |
---|
272 | |
---|
273 | The status object is essential when use together with any_source or any_tag. |
---|
274 | |
---|
275 | |
---|
276 | EXTENSIONS |
---|
277 | At this stage only a subset of MPI is implemented. However, |
---|
278 | most real-world MPI programs use only these simple functions. |
---|
279 | If you find that you need other MPI calls, please feel free to |
---|
280 | add them to the C-extension. Alternatively, drop me note and I'll |
---|
281 | use that as an excuse to update pypar. |
---|
282 | |
---|
283 | |
---|
284 | PERFORMANCE |
---|
285 | If you are passing simple Numeric arrays around you can reduce |
---|
286 | the communication time by using the 'buffer' keyword arguments |
---|
287 | (see REFERENCE above). These version are closer to the underlying MPI |
---|
288 | implementation in that one must provide receive buffers of the right size. |
---|
289 | However, you will find that these version have lower latency and |
---|
290 | can be somewhat faster as they bypass |
---|
291 | pypar's mechanism for automatically transferring the needed buffer size. |
---|
292 | Also, using simple numeric arrays will bypass pypar's pickling of general |
---|
293 | structures. |
---|
294 | |
---|
295 | This will mainly be of interest if you are using a 'fine grained' |
---|
296 | parallelism, i.e. if you have frequent communications. |
---|
297 | |
---|
298 | Try both versions and see for yourself if there is any noticable difference. |
---|
299 | |
---|
300 | |
---|
301 | PROBLEMS |
---|
302 | If you encounter any problems with this package please drop me a line. |
---|
303 | I cannot guarantee that I can help you, but I will try. |
---|
304 | |
---|
305 | Ole Nielsen |
---|
306 | Australian National University |
---|
307 | Email: Ole.Nielsen@anu.edu.au |
---|
308 | |
---|
309 | |
---|
310 | REFERENCES |
---|
311 | To learn more about MPI in general, see WEB sites |
---|
312 | http://www-unix.mcs.anl.gov/mpi/ |
---|
313 | http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg245380.pdf |
---|
314 | |
---|
315 | or books |
---|
316 | "Using MPI, 2nd Edition", by Gropp, Lusk, and Skjellum, |
---|
317 | "The LAM companion to 'Using MPI...'" by Zdzislaw Meglicki |
---|
318 | "Parallel Programming With MPI", by Peter S. Pacheco |
---|
319 | "RS/6000 SP: Practical MPI Programming", by Yukiya Aoyama and Jun Nakano |
---|
320 | |
---|
321 | To learn more about Python, see the WEB site |
---|
322 | http://www.python.org |
---|
323 | |
---|
324 | |
---|
325 | |
---|
326 | |
---|
327 | |
---|
328 | |
---|
329 | |
---|
330 | |
---|