1 | ------------------------------------------------ |
---|
2 | PyPAR - Parallel Python, no-frills MPI interface |
---|
3 | |
---|
4 | Author: Ole Nielsen (2001, 2002, 2003) |
---|
5 | Email: Ole.Nielsen@anu.edu.au |
---|
6 | Version: See pypar.__version__ |
---|
7 | Date: See pypar.__date__ |
---|
8 | ------------------------------------------------ |
---|
9 | |
---|
10 | CONTENTS: |
---|
11 | TESTING THE INSTALLATION |
---|
12 | GETTING STARTED |
---|
13 | FIRST PARALLEL PROGRAM |
---|
14 | PROGRAMMING FOR EFFICIENCY |
---|
15 | PYPAR REFERENCE |
---|
16 | DATA TYPES |
---|
17 | STATUS OBJECT |
---|
18 | EXTENSIONS |
---|
19 | PROBLEMS |
---|
20 | REFERENCES |
---|
21 | |
---|
22 | TESTING THE INSTALLATION |
---|
23 | |
---|
24 | After having installed pypar run |
---|
25 | |
---|
26 | mpirun -np 2 testpypar.py |
---|
27 | |
---|
28 | to verify that everything works. |
---|
29 | If it doesn't try to run a C/MPI program |
---|
30 | (for example ctiming.c) to see if the problem lies with |
---|
31 | the local MPI installation or with pypar. |
---|
32 | |
---|
33 | |
---|
34 | GETTING STARTED |
---|
35 | Assuming that the pypar C-extension compiled you should be able to |
---|
36 | write |
---|
37 | >> import pypar |
---|
38 | from the Python prompt. |
---|
39 | (This may not work under MPICH under Sunos - this doesn't matter, |
---|
40 | just skip this interactive bit and go to FIRST PARALLEL PROGRAM.) |
---|
41 | |
---|
42 | Then you can write |
---|
43 | >> pypar.size() |
---|
44 | to get the number of parallel processors in play. |
---|
45 | (When using pypar from the command line this number will be 1). |
---|
46 | |
---|
47 | Processors are numbered from 0 to pypar.size()-1. |
---|
48 | Try |
---|
49 | >> pypar.rank() |
---|
50 | to get the processor number of the current processor. |
---|
51 | (When using pypar from the command line this number will be 0). |
---|
52 | |
---|
53 | Finally, try |
---|
54 | >> pypar.Get_processor_name() |
---|
55 | to return host name of current node |
---|
56 | |
---|
57 | FIRST PARALLEL PROGRAM |
---|
58 | Take a look at the file demo.py supplied with this distribution. |
---|
59 | To execute it in parallel on four processors, say, run |
---|
60 | |
---|
61 | mpirun -np 4 demo.py |
---|
62 | |
---|
63 | |
---|
64 | from the UNIX command line. |
---|
65 | This will start four copies of the program each on its own processor. |
---|
66 | |
---|
67 | The program demo.py makes a message on processor 0 and |
---|
68 | sends it around in a ring |
---|
69 | - each processor adding a bit to it - until it arrives back at |
---|
70 | processor 0. |
---|
71 | |
---|
72 | All parallel programs must find out what processor they are running on. |
---|
73 | This is accomplished by the call |
---|
74 | |
---|
75 | myid = pypar.rank() |
---|
76 | |
---|
77 | The total number of processors is obtained from |
---|
78 | |
---|
79 | proc = pypar.size() |
---|
80 | |
---|
81 | One can then have different codes for different processors by branching |
---|
82 | as in |
---|
83 | |
---|
84 | if myid == 0 |
---|
85 | ... |
---|
86 | |
---|
87 | To send a general Python structure A to processor p, write |
---|
88 | pypar.send(A, p) |
---|
89 | and to receive something from processor q, write |
---|
90 | X = pypar.receive(q) |
---|
91 | |
---|
92 | This will cater for any (picklable Python structure) and make parallel |
---|
93 | programs very simple and readable. While reasonably fast, it does not |
---|
94 | achieve the full bandwidth obtainable by MPI. |
---|
95 | |
---|
96 | |
---|
97 | PROGRAMMING FOR EFFICIENCY |
---|
98 | For really fast communication one must stick to Numeric arrays and use |
---|
99 | 'raw' versions of send and receive, e.g.: |
---|
100 | To send a Numeric array A to processor p, write |
---|
101 | pypar.raw_send(A, p) |
---|
102 | and to receive the array from processor q, write |
---|
103 | X = pypar.raw_receive(X, q) |
---|
104 | Note that X acts as a buffer and must be pre-allocated prior to the |
---|
105 | receive statement as in Fortran and C programs using MPI. |
---|
106 | |
---|
107 | See the script pytiming for an example of communication of Numeric arrays. |
---|
108 | |
---|
109 | |
---|
110 | MORE PARALLEL PROGRAMMING WITH PYPAR |
---|
111 | |
---|
112 | When you are ready to move on, have a look at the supplied |
---|
113 | demos and the script testpypar.py which all give examples |
---|
114 | of parallel programming with pypar. |
---|
115 | You can also look at a standard MPI reference (C or Fortran) |
---|
116 | to get some ideas. The principles are the same in pypar except |
---|
117 | that many calls have been simplified. |
---|
118 | |
---|
119 | |
---|
120 | |
---|
121 | PYPAR REFERENCE |
---|
122 | Here is a list of functions provided by pypar: |
---|
123 | (See section on Data types for explanation of 'vanilla'). |
---|
124 | |
---|
125 | size() -- Number of processors |
---|
126 | rank() -- Id of current processor |
---|
127 | Get_processor_name() -- Return host name of current node |
---|
128 | |
---|
129 | send(x, destination, tag=0, vanilla=0) -- Blocking send (all types) |
---|
130 | Sends data in x to destination with given tag. |
---|
131 | |
---|
132 | y=receive(source, tag=0) -- Blocking receive (all types) |
---|
133 | receives data (y) from source (possible with specified tag). |
---|
134 | |
---|
135 | y, status=receive(source, tag=0, return_status=True) -- Blocking receive (all types) |
---|
136 | receives data (y) and status object from source (possible with specified tag). |
---|
137 | |
---|
138 | raw_send(x, destination, tag=0, vanilla=0): -- Blocking send (Fastest) |
---|
139 | Sends data in x to destination with given tag. |
---|
140 | It differs from send in that the receiver MUST provide a buffer |
---|
141 | to store the received data. |
---|
142 | Although it will accept all types raw_send is thought to be used |
---|
143 | mainly for Numeric arrays. |
---|
144 | |
---|
145 | raw_receive(x, source, tag=0, vanilla=0): -- Raw blocking receive (Fastest) |
---|
146 | receives data from source (possible with specified tag) and puts |
---|
147 | it in x (which must be of compatible size and type). |
---|
148 | It also returns a reference to x. |
---|
149 | Although it will accept all types raw_send is thought to be used |
---|
150 | mainly for Numeric arrays. |
---|
151 | |
---|
152 | x, status = raw_receive(x, source, tag=0, vanilla=0, return_status=True): -- Raw blocking receive (Fastest) |
---|
153 | receives data and status object from source (possible with specified tag) and puts |
---|
154 | it in x (which must be of compatible size and type). |
---|
155 | |
---|
156 | |
---|
157 | bcast(X, rootid) -- Broadcasts X from rootid to all other processors. |
---|
158 | All processors must issue the same bcast. |
---|
159 | |
---|
160 | |
---|
161 | raw_scatter(x, nums, buffer, source, vanilla=0): |
---|
162 | Scatter the first nums elements in x to buffer |
---|
163 | (of size given by nums) from source. |
---|
164 | |
---|
165 | |
---|
166 | scatter(x, source, vanilla=0): |
---|
167 | Scatter all elements in x to a buffer |
---|
168 | created by this function and returned. |
---|
169 | |
---|
170 | |
---|
171 | raw_gather(x, buffer, source, vanilla=0): |
---|
172 | Gather all elements in x to buffer |
---|
173 | from source. |
---|
174 | Buffer must have size len(x) * numprocs and |
---|
175 | shape[0] == x.shape[0]*numprocs |
---|
176 | |
---|
177 | gather(x, source, vanilla=0): |
---|
178 | Gather all elements in x to buffer of |
---|
179 | size len(x) * numprocs |
---|
180 | created by this function and returned. |
---|
181 | If x is multidimensional buffer will have |
---|
182 | the size of zero'th dimension multiplied by numprocs |
---|
183 | |
---|
184 | |
---|
185 | raw_reduce(x, buffer, op, source, vanilla=0): |
---|
186 | Reduce all elements in x to buffer (of the same size as x) |
---|
187 | at source applying operation op elementwise. |
---|
188 | |
---|
189 | |
---|
190 | reduce(x, op, source, vanilla=0): |
---|
191 | Reduce all elements in x at source |
---|
192 | applying operation op elementwise and return result in new buffer. |
---|
193 | Buffer is created and returned. |
---|
194 | |
---|
195 | |
---|
196 | Wtime() -- MPI wall time |
---|
197 | Barrier() -- Synchronisation point. Makes processors wait until all |
---|
198 | processors have reached this point. |
---|
199 | Abort() -- Terminate all processes. |
---|
200 | Finalize() -- Cleanup MPI. No parallelism can take place after this point. |
---|
201 | |
---|
202 | See pypar.py for doc strings on individual functions. |
---|
203 | |
---|
204 | |
---|
205 | DATA TYPES |
---|
206 | Pypar automatically handles different data types differently |
---|
207 | There are three protocols: |
---|
208 | 'array': Numeric arrays of type 'i', 'l', 'f', or 'd' can be communicated |
---|
209 | with the underlying mpiext.send_array and mpiext.receive_array. |
---|
210 | This is the fastest mode. |
---|
211 | 'string': Text strings can be communicated with mpiext.send_string and |
---|
212 | mpiext.receive_string. |
---|
213 | 'vanilla': All other types can be communicated using the scripts |
---|
214 | send_vanilla and receive_vanilla provided that the objects |
---|
215 | can be serialised using |
---|
216 | pickle (or cPickle). The latter mode is less efficient than the |
---|
217 | first two but it can handle complex structures. |
---|
218 | |
---|
219 | Rules: |
---|
220 | If keyword argument vanilla == 1, vanilla is chosen regardless of |
---|
221 | x's type. |
---|
222 | Otherwise if x is a string, the string protocol is chosen |
---|
223 | If x is an array, the 'array' protocol is chosen provided that x has one |
---|
224 | of the admissible typecodes. |
---|
225 | |
---|
226 | Function that take vanilla as a keyword argument can force vanilla mode |
---|
227 | on any datatype. |
---|
228 | |
---|
229 | STATUS OBJECT |
---|
230 | A status object can be optionally returned from receive and raw_receive by |
---|
231 | specifying return_status=True in the call. |
---|
232 | The status object can subsequently be queried for information about the communication. |
---|
233 | The fields are: |
---|
234 | status.source: The origin of the received message (use e.g. with pypar.any_source) |
---|
235 | status.tag: The tag of the received message (use e.g. with pypar.any_tag) |
---|
236 | status.error: Error code generated by underlying C function |
---|
237 | status.length: Number of elements received |
---|
238 | status.size: Size (in bytes) of one element |
---|
239 | status.bytes(): Number of payload bytes received (excl control info) |
---|
240 | |
---|
241 | The status object is essential when use together with any_source or any_tag. |
---|
242 | |
---|
243 | |
---|
244 | EXTENSIONS |
---|
245 | At this stage only a subset of MPI is implemented. However, |
---|
246 | most real-world MPI programs use only these simple functions. |
---|
247 | If you find that you need other MPI calls, please feel free to |
---|
248 | add them to the C-extension. Alternatively, drop me note and I'll |
---|
249 | use that as an excuse to update pypar. |
---|
250 | |
---|
251 | |
---|
252 | PERFORMANCE |
---|
253 | If you are passing simple Numeric arrays around you can reduce |
---|
254 | the communication time by using the '_raw' versions of send and |
---|
255 | receive (see REFERENCE above). These version are closer to the underlying MPI |
---|
256 | implementation in that one must provide receive buffers of the right size. |
---|
257 | However, you will find that this can be somewhat faster as they bypass |
---|
258 | pypar's mechanism for automatically transferring the needed buffer size. |
---|
259 | Also, using simple numeric arrays will bypass pypar's pickling of complex |
---|
260 | structures. |
---|
261 | |
---|
262 | This will mainly be of interest if you are using a 'fine grained' |
---|
263 | parallelism, i.e. if you have frequent communications. |
---|
264 | |
---|
265 | Try both versions and see for yourself if there is any noticable difference. |
---|
266 | |
---|
267 | |
---|
268 | PROBLEMS |
---|
269 | If you encounter any problems with this package please drop me a line. |
---|
270 | I cannot guarantee that I can help you, but I will try. |
---|
271 | |
---|
272 | Ole Nielsen |
---|
273 | Australian National University |
---|
274 | Email: Ole.Nielsen@anu.edu.au |
---|
275 | |
---|
276 | |
---|
277 | REFERENCES |
---|
278 | To learn more about MPI in general, see WEB sites |
---|
279 | http://www-unix.mcs.anl.gov/mpi/ |
---|
280 | http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg245380.pdf |
---|
281 | |
---|
282 | or books |
---|
283 | "Using MPI, 2nd Edition", by Gropp, Lusk, and Skjellum, |
---|
284 | "The LAM companion to 'Using MPI...'" by Zdzislaw Meglicki |
---|
285 | "Parallel Programming With MPI", by Peter S. Pacheco |
---|
286 | "RS/6000 SP: Practical MPI Programming", by Yukiya Aoyama and Jun Nakano |
---|
287 | |
---|
288 | To learn more about Python, see the WEB site |
---|
289 | http://www.python.org |
---|
290 | |
---|
291 | |
---|
292 | |
---|
293 | |
---|
294 | |
---|
295 | |
---|
296 | |
---|
297 | |
---|