source: pypar/documentation/pypar_tutorial.tex @ 650

Last change on this file since 650 was 124, checked in by ole, 19 years ago

Doc files and ring_example

File size: 26.7 KB
Line 
1\documentclass[12pt]{article}
2
3\usepackage{url}
4\usepackage{times}
5\usepackage[dvips]{graphicx}
6
7%\textheight=25cm
8\textwidth=14cm
9%\topmargin=-4cm
10%\oddsidemargin=0cm
11
12
13
14\begin{document}
15
16\title{Pypar Tutorial\\
17Building a Parallel Program using Python}
18
19
20\author{Ole Nielsen \\
21Australian National University, Canberra \\
22Ole.Nielsen@anu.edu.au}
23
24\maketitle
25\section*{Introduction}
26
27This is a tutorial demonstrating essential
28features of parallel programming and building
29a simple Python MPI program using the MPI binding \texttt{pypar} available at
30\url{http://datamining.anu.edu.au/software/pypar}.
31
32It is assumed that \texttt{pypar} has been installed on your system.
33If not, see the installation document (\texttt{pypar\_installation.pdf})
34that comes with pypar
35(or get it from the web site at
36\url{http://datamining.anu.edu.au/~ole/pypar/pypar_installation.pdf}).
37
38For a reference to all of Pypar's functionality, see the
39reference manual (\texttt{pypar\_reference.pdf})
40that comes with pypar (or get it from the web site at
41\url{http://datamining.anu.edu.au/~ole/pypar/pypar_reference.pdf}).
42
43
44\section{The Message Passing Interface --- MPI}
45MPI is the defacto standard for parallel programming of both
46distributed computer clusters of and shared memory architectures.
47MPI programming takes the form of special commands that are
48imported from the MPI library. They provide information to the program
49about how many processes are available, which number each process has been
50allocated and functions to communicate with other processes.
51See \url{http://www.netlib.org/utk/papers/mpi-book/mpi-book.html} for
52a comprehensive reference to MPI (based on
53the C or Fortran programming languages).
54
55It is said that MPI is both large and small. What is meant is that the
56MPI standard has hundreds of functions in it. However, many of the
57advanced routines represent functionality that can be ignored unless
58one pursues added flexibility (data types), overlap of computation and
59communication (nonblocking send/receive), modularity (groups,
60communicators), or convenience (collective operations, topologies).
61MPI is said to be small because the majority of useful and efficient
62parallel programs are written using only a small essential subset of MPI.
63In fact, it can be argued that parallel performance is best achieved by
64keeping the parallelism simple.
65
66Pypar is a Python binding to such a subset.
67
68
69\section{A simple Pypar MPI program}
70Let us start with a simple example: a Python program
71that imports the pypar module and
72reports the number of processors, local id and host name
73This is the Pypar equivalent of the infamous 'hello world'
74program if you like.
75Make a file called \texttt{ni.py} using your favourite
76editor\footnote{The author believs that your favourite editor should
77be GNU/Emacs, but it is up to you, of course.}
78and type in the following contents:
79{\footnotesize
80\begin{verbatim}
81import pypar
82
83p = pypar.rank()
84P = pypar.size()
85node = pypar.get_processor_name()
86
87print "I am process %d of %d on node %s" %(p, P, node)
88
89pypar.finalize()
90\end{verbatim}}
91\noindent and run it as follows:
92\begin{verbatim}
93mpirun -np 4 python ni.py
94\end{verbatim}
95You should get a greeting from each of the four nodes looking
96something like
97\begin{verbatim}
98Pypar (version 1.9) initialised MPI OK with 4 processors
99I am process 0 of 4 on node ninja-1
100I am process 3 of 4 on node ninja-4
101I am process 1 of 4 on node ninja-2
102I am process 2 of 4 on node ninja-3
103\end{verbatim}
104The order of output may be arbitrary!  If four copies of the
105program is run, print is called four times. The order in which each
106process executes the message is undetermined, based on when they each
107reach that point in their execution of the program, and how they
108travel on the network.
109\emph{Note that all the print's, though they come from different
110  processes, will send their output intact to your shell window; this
111  is generally true of output commands.
112  Input commands, like \texttt{raw_input},
113  will only work on the process with rank zero.}
114
115 
116\subsection{The MPI execution model}
117
118It is important to understand that in MPI (and therefore also in
119Pypar), multiple identical copies of this program
120will start simultaneously in separate processes.
121These processes may run on one machine or on multiple machines.
122This is a fundamental difference from ordinary programs,
123where, if someone sais ``run the program'', it is
124assumed that there was only one instance of the program running.
125
126
127XXXXX
128In a nutshell, this program sets up a communication group of 4
129processes, where each process gets its rank (\texttt{p}), prints it,
130and exits.
131
132.......Explain in detail like below..
133
134
135
136
137The first line,
138\texttt{\#include <stdio>}
139should be familiar to all C programmers. It includes the standard
140input/output routines like printf. The second line,
141\texttt{\#include <mpi.h>}
142includes the MPI functions. The file \texttt{mpi.h} 
143contains prototypes for all the
144MPI routines in this program; this file is located in
145\texttt{/usr/include/lam/mpi.h} in case you actually want to look at it.
146The program starts with the main... line which takes the two arguments
147argc and argv (which is normal in C): The argument \texttt{argc} is
148the number of commandline arguments given to the program (including itself)
149and \texttt{argv} is an array of strings containing them. For example
150argv[0] will contain the program name itself. We will not be using them
151directly but MPI needs access to them through \texttt{MPI\_init}.
152Then the program declares one integer variable, \texttt{myid}. The
153first step of the program,
154\begin{verbatim} 
155      MPI_Init(&argc,&argv);
156\end{verbatim}       
157calls MPI\_Init to set up everything.
158This should be the first command executed in all MPI programs. This
159routine takes pointers to \texttt{argc} and \texttt{argv},
160looks at them, pulls out the purely
161MPI-relevant things, and generally fixes them so you can use command line
162arguments as normal.
163
164\medskip
165
166Next, the program runs \texttt{MPI\_Comm\_rank},
167passing it the default communicator MPI\_COMM\_WORLD
168(which is the group of \emph{all} processes started)
169and a \emph{pointer} to \texttt{myid}. Passing a pointer is the way
170C can change the value of te argument \texttt{myid}.
171\texttt{MPI\_Comm\_rank} will set \texttt{myid} to the rank of the machine
172on which the program is running.
173Remember that in reality, several instances of this program
174start up in several different processes when this program is run. Each
175will receive a unique number from \texttt{MPI\_Comm\_rank}.
176Because multiple copies are running, each will execute all lines
177in the program including the \texttt{printf} statement which prints a
178message and the rank.
179After doing everything else, the program calls \texttt{MPI\_Finalize},
180which generally terminates everything and shuts down MPI. This should be the
181last command executed in all MPI programs.
182
183\subsection*{Compile the MPI program}
184
185Normally, C programs are compiled with a command such as
186\texttt{gcc hello.c -o hello.x}. However, MPI programs need access to
187more libraries which are provided through the wrapper program
188\texttt{mpicc}. This can be used as usual. Try to run
189\begin{verbatim} 
190  mpicc hello.c -o hello.x
191\end{verbatim}
192If you made a syntactically correct program a \emph{compiled} 
193program will appear with the name you specified after the -o option.
194If not you will see some error messages and you'll have to correct
195those before continuing.
196
197\subsection*{Run the MPI program}
198In order to run an MPI compiled program, you must type:
199\begin{verbatim}
200mpirun -np <number of processes> [options] <program name and arguments>
201\end{verbatim} 
202where you specify the number of processes on which you want to run
203your parallel program, optional mpirun options,
204and your program name and its expected arguments.
205In this case try:
206\begin{verbatim}
207    mpirun -np 1 hello.x
208\end{verbatim}
209This will run one copy of your program and the output should like this
210\begin{verbatim} 
211Sawatdii khrap thuk thuk khon (Process 0)
212\end{verbatim} 
213
214Now try to start four processes by running \texttt{mpirun -np 4 hello.x}.
215You should see the following output:
216\begin{verbatim}
217Sawatdii khrap thuk thuk khon (Process 0)
218Sawatdii khrap thuk thuk khon (Process 3)
219Sawatdii khrap thuk thuk khon (Process 1)
220Sawatdii khrap thuk thuk khon (Process 2)
221\end{verbatim}
222Note that the order of output may be arbitrary!  If four copies of the
223program is run, printf is called four times.  The order in which each
224process executes the message is undetermined, based on when they each
225reach that point in their execution of the program, and how they
226travel on the network. Your guess is as good as mine. 
227\emph{Note that all the printf's, though they come from different
228  processes, will send their output intact to your shell window; this
229  is generally true of output commands. Input commands, like scanf,
230  will only work on the process with rank zero.}
231
232\subsubsection*{Exercise}
233The MPI command \texttt{MPI\_Comm\_size(MPI\_COMM\_WORLD, \&proc);} will
234store the total number of processes started in the integer variable
235\texttt{number\_of\_processes}. Modify the program
236to give the following output:
237\begin{verbatim}
238Sawatdii khrap thuk thuk khon (Process 0 of 4)
239Sawatdii khrap thuk thuk khon (Process 3 of 4)
240Sawatdii khrap thuk thuk khon (Process 1 of 4)
241Sawatdii khrap thuk thuk khon (Process 2 of 4)
242\end{verbatim}
243when started with four processes.
244 
245
246\subsection*{Running on multiple computers} 
247
248So far we have only \emph{simulated} the parallelism by running
249multiple processes on one machine.
250We will now add host specific information to the program and then run it
251in parallel.
252
253Add the following declarations to your program:
254\begin{verbatim}
255int  namelen;
256char processor_name[MPI_MAX_PROCESSOR_NAME];
257\end{verbatim}
258
259Add the command
260\begin{verbatim}
261MPI_Get_processor_name(processor_name, &namelen); 
262\end{verbatim}
263This will store the hostname of the processor executing the given process
264along with the length of the hostname. Then modify the print statement to   
265\begin{verbatim}
266  printf("Sawatdii khrap thuk thuk khon (Process %d of %d running on %s)\n",
267         myid, number_of_processes, processor_name);
268\end{verbatim}
269Compiling and running the program you should see the folowing output
270\begin{verbatim}
271Sawatdii khrap thuk thuk khon (Process 0 of 4 running on ninja-n)
272Sawatdii khrap thuk thuk khon (Process 1 of 4 running on ninja-n)
273Sawatdii khrap thuk thuk khon (Process 2 of 4 running on ninja-n)
274Sawatdii khrap thuk thuk khon (Process 3 of 4 running on ninja-n)
275\end{verbatim}
276We are still running all out MPI processes on one computer but we are
277now ready to run this program in parallel.
278
279Edit the file \texttt{~/.lamhosts} to contain all machines in our network
280and run \texttt{lamboot -v ~/.lamhosts} again. You should see the following
281diagnostic message:
282\begin{verbatim} 
283LAM 6.5.8/MPI 2 C++/ROMIO - Indiana University
284
285Executing hboot on n0 (ninja-1 - 1 CPU)...
286Executing hboot on n1 (ninja-2 - 1 CPU)...
287Executing hboot on n2 (ninja-3 - 1 CPU)...
288Executing hboot on n3 (ninja-4 - 1 CPU)...
289Executing hboot on n4 (ninja-5 - 1 CPU)...
290Executing hboot on n5 (ninja-6 - 1 CPU)...
291Executing hboot on n6 (ninja-7 - 1 CPU)...
292Executing hboot on n7 (ninja-8 - 1 CPU)...
293topology done
294\end{verbatim}
295\noindent Finally try to run your program again and verify that it runs
296on different nodes.
297What happens if you run more processes than there are physical machines?
298
299       
300\section*{Optional: synchronise the clocks}
301
302\textbf{\emph{Note: This an optional exercise for those who can't get enough!}}
303
304The clocks in our cluster are not synchronised: They show differenttimes.
305Try for example to run
306\begin{verbatim}
307ssh -x ninja-2 date; ssh -x ninja-5 date; ssh -x ninja-3 date; date
308You will see that the times listed differ significantly.
309\end{verbatim}
310
311Try and search for things like
312\texttt{synchronize time network}
313and see if you can figure out how to do it.
314
315
316\section*{MPI resources}
317
318\begin{itemize} 
319  \item \url{http://www.cs.appstate.edu/~can/classes/5530} (exellent introduction
320  and many useful links)
321  \item http://www-unix.mcs.anl.gov/mpi/mpich (The MPICH implementation)
322  \item http://www.lam-mpi.org (The LAM-MPI implementation)
323  \item http://www.netlib.org/utk/papers/mpi-book/mpi-book.html\\
324    (The complete MPI reference)
325\end{itemize} 
326
327
328
329\section*{Exercise 6}
330
331In this exercise we will continue the introduction to MPI
332by using the local id to differentiate work and
333by familiarising ourselves with the basic send and receive
334primitives.
335We will then proceed to measure the fundamental characteristics of
336the network: namely latency and bandwidth.
337
338
339
340\subsection*{Using local id to differentiate work}       
341In Exercise 5 we had every processor write to the screen. This
342is convenient for debugging purposes
343and I suggest you start every parallel program
344with a message from each processor about its rank, the total number
345of processors and the hostname it is running on (see below).
346
347However, it is often also desirable to have only on
348processor output to screen
349for example to report the final result of a calculation.
350
351\subsubsection*{Exercise} 
352Write a parallel program that
353produces something like the following output when run on four processors:
354\begin{verbatim}
355  P2/4: Initialised OK on host ninja-6
356  P0/4: Initialised OK on host ninja-4
357  P0: Program terminating 
358  P1/4: Initialised OK on host ninja-5
359  P3/4: Initialised OK on host ninja-7     
360\end{verbatim}
361Arrange the program such that
362all processors report on their initialisation but
363only one, processor 0, reports on its termination just prior
364to \texttt{MPI\_Finalize()}.
365Note, as in Exercise 5, that the order of output is
366arbitrary.
367%\emph{Hint: Use the value of myid to decide whether to print or not}.
368
369\subsection*{A simple send and receive pair}
370In this part we will use the fundamental MPI send and receive commands
371to communicate among the processors. The commands are:
372\begin{itemize}
373  \item \texttt{MPI\_Send}:
374  This routine performs a basic send; this routine may block until
375  the message
376  is received, depending on the specific implementation of MPI.
377  \begin{verbatim} 
378  int MPI_Send(void* buf, int count, MPI_Datatype datatype,
379                     int dest, int tag, MPI_Comm comm)
380  Input:
381      buf  - initial address of send buffer
382      count - number of elements in send buffer (nonnegative integer)
383      datatype - datatype of each send buffer element
384      dest - rank of destination (integer)
385      tag  - message tag (integer)
386      comm - communicator
387  \end{verbatim} 
388  \textbf{Example}: To send one integer, A say, to processor 3 with tag 13:
389  \begin{verbatim}
390    MPI_Send(&A, 1, MPI_INT, 3, 13, MPI_COMM_WORLD); 
391  \end{verbatim}
392  \item \texttt{MPI\_Recv}:
393    This routine performs a basic receive.
394  \begin{verbatim}
395  int MPI_Recv(void* buf, int count, MPI_Datatype datatype,
396                     int source, int tag, MPI_Comm comm,
397                     MPI_Status *status)
398
399  Input:
400      count - maximum number of elements in receive buffer
401             (integer)
402      datatype - datatype of each receive buffer element
403      source - rank of source (integer)
404      tag  - message tag (integer)
405      comm - communicator
406                     
407  Output:
408      buf  - initial address of receive buffer
409      status - status object, provides information about
410               message received;
411      status is a structure of type MPI_Status, the element
412      status.MPI_SOURCE is the source of the message received,
413      and the element status.MPI_TAG is the tag value.
414
415  \end{verbatim}
416  \textbf{Example}: To receive one integer, B say, from processor 7
417  with tag 13:
418  \begin{verbatim}
419    MPI_Recv(&B, 1, MPI_INT, 7, 13, MPI_COMM_WORLD, &status); 
420  \end{verbatim}
421\end{itemize} 
422
423\noindent These calls are described in more detail in Ian Foster online book
424in \textbf{MPI Basics}:
425\url{http://www-unix.mcs.anl.gov/dbpp/text/node96.html}.
426The chapter about \textbf{Asynchronous communication},
427\url{http://www-unix.mcs.anl.gov/dbpp/text/node98.html}, describes how
428to query the Status Object.
429
430MPI defines the following constants for use with \texttt{MPI\_Recv}:
431\begin{itemize} 
432  \item \texttt{MPI\_ANY\_SOURCE}: Can be used to specify that a message
433  can be received from anywhere instead of a specific source.
434  \item \texttt{MPI\_ANY\_TAG}: Can be used to specify that a message
435  can have any tag instead of a specific tag.
436\end{itemize}   
437
438\subsubsection*{Exercise}
439The following program segment implements a very simple
440communication pattern: Processor 0 sends a number to processor 1 and
441both outputs a diagnostic message.
442
443Add to your previous program the declarations:
444\begin{verbatim}
445  int A, B, source, destination, tag=13;
446  MPI_Status status;
447\end{verbatim}
448and the following code segment
449\begin{verbatim} 
450  if (myid == 0) {
451    A = 42;
452    destination = 1;
453    printf("P%d: Sending value %d to MPI process %d\n",
454           myid, A, destination);   
455    MPI_Send(&A, 1, MPI_INT, 1, 13, MPI_COMM_WORLD);
456  } else if (myid == 1) {
457    source = 0;
458    MPI_Recv(&B, 1, MPI_INT, source, 13, MPI_COMM_WORLD, &status);   
459    printf("P%d: Received value %d from MPI process %d\n", myid, B, source);
460  }
461\end{verbatim}
462make sure it can compile and run on 2 processors.
463Verify that your output looks something like 
464\begin{verbatim}
465P0/2: Initialised OK on host ninja-1
466P0: Sending value 42 to MPI process 1
467P0: Terminating
468P1/2: Initialised OK on host ninja-2
469P1: Received value 42 from MPI process 0
470\end{verbatim}
471
472
473
474\subsection*{Send messages around in a ring - Exercise}
475
476Write an MPI program which passes data
477around in a ring structure from process 0 then to process 1 etc.
478When the message reaches the last processor it should be passed back to
479processor 0 - this forms a 'communication ring'.
480Every process should add one to the value before passing it on
481and write out a diagnostic message.
482
483With a starting value of 42 and running on four processors your program
484should produce the following output:
485\begin{verbatim}
486P0/4: Initialised OK on host ninja-1
487P0: Sending value 42 to MPI process 1
488P1/4: Initialised OK on host ninja-2
489P3/4: Initialised OK on host ninja-4
490P1: Received value 42 from MPI process 0
491P1: Sending value 43 to MPI process 2
492P2/4: Initialised OK on host ninja-3
493P2: Received value 43 from MPI process 1
494P2: Sending value 44 to MPI process 3
495P3: Received value 44 from MPI process 2
496P3: Sending value 45 to MPI process 0
497P0: Received value 45 from MPI process 3
498\end{verbatim}
499\emph{Hint}: You can use the \texttt{modulus} operator to calculate
500the destination such that the next processor after the last becomes 0:
501\begin{verbatim}
502  destination = (myid + 1) % number_of_processors;
503\end{verbatim}
504
505
506\subsection*{Timing}
507A main reason for doing parallel computing is that of faster execution.
508Therefore \emph{timing} is an integral part of this game.
509MPI provides a function \texttt{MPI\_Wtime} which counts seconds.
510To time a code segment with \texttt{MPI\_Wtime} do something like this:
511\begin{verbatim}
512  double t0;
513 
514  t0=MPI_Wtime();
515  // (Computations)
516  printf("Time = %.6f sec\n", MPI_Wtime() - t0);
517\end{verbatim}
518
519\subsubsection*{Exercise}
520Time the previous 'ring' program in such a way that only process 0
521measures the time and writes it to the screen
522
523
524\subsection*{Measure network latency and bandwidth}
525
526The time it takes to transmit a message can be approximated by the model
527\[
528  t = t_l + \alpha t_b
529\]
530where
531\begin{itemize} 
532  \item $t_l$ is the \emph{latency}. The latency is the
533  time it takes to start the transmission before anything
534  gets communicated. This involves opening the network connection,
535  buffering the message and so on.
536  \item $t_b$ is the time it takes to communicate one byte once
537  the connection is open. This is related to the \emph{bandwidth} $B$,
538  defined as the number of bytes transmitted per second, as
539  \[
540    B = \frac{1}{t_b}
541  \]
542  \item $\alpha$ is the number of bytes communicated.
543\end{itemize}   
544
545\subsubsection*{Exercise}
546\textbf{Can you work out what the network latency is?}
547\emph{Hint:} Send a very small message (e.g.\ one integer)
548from one processor to another and and back again a number of times
549and measure the time.   
550
551\subsubsection*{Exercise}
552\textbf{Can you work out how many megabytes the network can transmit per
553second?} 
554Create arrays of varying size, send them from one processor
555to another and back and measure the time.
556\emph{Hints:}
557\begin{itemize} 
558  \item To create an array of double precision floating point numbers
559    do the following.
560    \begin{verbatim}
561      #define MAX_LEN  500000    /* Largest block */
562      double A[MAX_LEN];
563    \end{verbatim}
564  \item To populate the array with pseudo random numbers: 
565  \begin{verbatim} 
566   for (j=0; j<MAX_LEN; j++) A[j]=rand();
567   \end{verbatim}
568   \item To send only part of the array use the second argument in
569   MPI\_Send and MPI\_Recv to control how much information is transferred.
570   \item MPI\_Send and MPI\_Recv require a \emph{pointer} to the first
571   element of the array. Since arrays are already defined as pointers
572   simply put \texttt{A} rather than \texttt{\&A} which is what
573   we used earlier for the
574   simple integer variable.
575   \item One double precision number takes up 8 bytes - use that in your
576   estimate.
577\end{itemize} 
578
579Your measurements may fluctuate based on the network traffic and the
580load on the machines.
581
582
583\section*{Good MPI resources}
584\begin{itemize} 
585  \item MPI Data types:\\ 
586    \url{http://www.ats.ucla.edu/at/hpc/parallel_computing/mpi-intro.htm}
587  \item Online book by Ian Foster:\\ 
588    \url{http://www-unix.mcs.anl.gov/dbpp/text/node94.html}
589\end{itemize}       
590
591
592
593
594
595%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
596\section{Identifying the processors}
597
598 
599%%%%%%%%%%%
600
601
602It is almost impossible to demonstrate parallelism with an interactive
603session at the command prompt as we did with the Python and Maple tutorials.
604Therefore we must write a \emph{program} that can be executed in parallel.
605Start an editor of your choice with a new file
606(called \texttt{paraprime.py}, say). For example:
607\begin{verbatim}
608  emacs paraprime.py &
609\end{verbatim}
610
611\noindent Write the following in the file and save it
612{\small \begin{verbatim}
613import pypar
614
615numproc = pypar.size()
616myid =    pypar.rank()
617
618print "I am proc %d of %d" %(myid, numproc)
619\end{verbatim}}
620
621Then type the following at the Unix prompt to execute the
622program normally on one processor
623{\small \begin{verbatim}
624  python paraprime.py
625\end{verbatim}}
626\noindent You should see the following output
627{\small \begin{verbatim}
628> python paraprime.py
629I am proc 0 of 1
630\end{verbatim}}
631
632
633\noindent Now let us try and run the same program on four processors:
634{\small \begin{verbatim}
635> prun -n 4 python paraprime.py
636I am proc 0 of 4
637I am proc 1 of 4
638I am proc 2 of 4
639I am proc 3 of 4
640\end{verbatim}}
641If you get a message from each of the four processors stating their
642id as above, we know that everything works. We have a parallel program!
643
644\section{Strategies for efficient parallel programming}
645A final note on efficiency.
646
647Parallel programming should lead to faster execution and/or
648ability to deal with larger problems.
649
650The ultimate goal is to be \emph{P} times faster with \emph{P} processors.
651However \emph{speedup} is usually less than \emph{P}. One must address
652three critical issues to achieve good speedup:
653\begin{itemize}
654  \item \textbf{Interprocessor communication:} The amount and the frequency
655  of messages should be kept as low as possible. Ensure that processors
656  have plenty of work to do between communications.
657  \item \textbf{Data distribution and load balancing:} If some processors
658  finish much sooner than others, the total execution time of the parallel
659  program is bounded by that of the slowest processor and we say that the
660  program is poorly load balanced.
661  Ensure that each processor get its fair share of the work load.
662  \item \textbf{Sequential parts of a program:} If half of a program, say, is
663  inherently sequential the speedup can never exceed 2 no matter how well
664  the remaining half is parallelised. This is known as Amdahls law.
665  Ensure that the all cost intensive parts get parallelised.
666\end{itemize}
667
668
669
670
671
672\section*{Exercise 8}
673
674In Exercise 6-7 you sent messages around in rings and back and forth between
675two processors and also timed the network speed when communicating between to
676processors.
677The purpose of this exercise is to try a collective communication patters
678where on processor sends and gathers data from all the others using the
679basic MPI\_Send and MPI\_Recv.
680
681\subsection*{Distribute and gather integers}
682
683Create a program where process 0 distributes an integer to all other processes,
684then receives an integer back from each of them.
685The other processes should receive the integer from process 0, add their
686rank to it and pass it back to process 0.
687
688With some suitable diagnostic output your program should produce
689something like this when run on four processors:
690
691\begin{verbatim} 
692P0/4: Initialised OK on host ninja-1
693P0: Sending value 42 to MPI process 1
694P0: Sending value 42 to MPI process 2
695P0: Sending value 42 to MPI process 3
696P3/4: Initialised OK on host ninja-4
697P1/4: Initialised OK on host ninja-2
698P2/4: Initialised OK on host ninja-3
699P3: Received value 42 from MPI process 0
700P3: Sending value 45 to MPI process 0
701P1: Received value 42 from MPI process 0
702P1: Sending value 43 to MPI process 0
703P2: Received value 42 from MPI process 0
704P2: Sending value 44 to MPI process 0
705P0: Received value 43 from MPI process 1
706P0: Received value 44 from MPI process 2
707P0: Received value 45 from MPI process 3
708\end{verbatim}
709
710
711\subsection*{Distribute and gather arrays}
712
713In this part we will do exactly the same as above except we will use
714a double precision array as the data.
715
716\begin{enumerate}
717  \item Create a double precision array (A) of length $N=1024$ to use as buffer
718  \item On process 0 populate it with numbers $0$ to $N-1$ 
719  (using A[i] = (double) i;)
720  \item Pass A on to all other processors (the workers).
721  \item All workers should compute the $p$-norm of the array
722  where $p$ is the rank of the worker:
723  \[
724    \left(
725      \sum_{i=0}^N A[i]^p
726    \right)^{1/p}
727  \]
728  \item Then return the result as one double precision number per worker and
729  print them all out on processor 0.
730\end{enumerate} 
731
732\emph{Hints:}
733\begin{itemize}
734  \item To compute $x^y$ use the C-function pow(x, y);
735  \item Include the math headers in your program: \verb+#include <math.h>+
736  \item You'll also need to link in the math libraries as follows:
737     mpicc progname.c -lm
738  %\item You might want to add many print statements reporting the communication pattern in   
739\end{itemize}   
740
741If your pogram works it should produce something like this on four processors.
742\begin{verbatim}
743P0/4: Initialised OK on host ninja-1
744P3/4: Initialised OK on host ninja-4
745P2/4: Initialised OK on host ninja-3
746P1/4: Initialised OK on host ninja-2
747P0: Received value 523776.00 from MPI process 1
748P0: Received value 18904.76 from MPI process 2
749P0: Received value 6497.76 from MPI process 3
750\end{verbatim}
751
752
753
754
755\end{document}
Note: See TracBrowser for help on using the repository browser.