Context Navigation

source: inundation/parallel/documentation/parallel.tex @ 2906

Last change on this file since 2906 was 2906, checked in by linda, 18 years ago
Made correction to the parallel report
File size: 14.5 KB

Line
1	\chapter{Running the Code in Parallel}
2
3	This chapter looks at how to run the code in parallel. The first section gives an overview of the main steps required to divide the mesh over the processors. The second sections describes some example code and the final section talks about how to run the code on a few specific architectures.
4
5
6	\section {Partitioning the Mesh}\label{sec:part}
7	There are four main steps required to run the code in parallel. They are;
8	\begin{enumerate}
9	\item partition the mesh into a set of non-overlapping submeshes
10	(\code{pmesh_divide_metis} from {\tt pmesh_divide.py}),
11	\item build a \lq ghost\rq\ or communication layer of triangles
12	around each submesh and define the communication pattern (\code{build_submesh} from {\tt build_submesh.py}),
13	\item distribute the submeshes over the processors (\code{send_submesh}
14	and \code{rec_submesh} from {\tt build_commun.py}),
15	\item and update the numbering scheme for each submesh assigned to a
16	processor (\code{build_local_mesh} from {\tt build_local.py}).
17	\end{enumerate}
18	See Figure \ref{fig:subpart}
19
20	\begin{figure}[hbtp]
21	\centerline{ \includegraphics[scale = 0.75]{figures/domain.eps}}
22	\caption{The main steps used to divide the mesh over the processors.}
23	\label{fig:subpart}
24	\end{figure}
25
26	\subsection {Subdividing the Global Mesh}\label{sec:part1}
27
28	The first step in parallelising the code is to subdivide the mesh
29	into, roughly, equally sized partitions. On a rectangular mesh this may be
30	done by a simple co-ordinate based dissection, but on a complicated
31	domain such as the Merimbula mesh shown in Figure \ref{fig:mergrid}
32	a more sophisticated approach must be used. We use pymetis, a
33	python wrapper around the Metis
34	(\url{http://glaros.dtc.umn.edu/gkhome/metis/metis/overview})
35	partitioning library. The \code{pmesh_divide_metis} function defined
36	in {\tt pmesh_divide.py} uses Metis to divide the mesh for
37	parallel computation. Metis was chosen as the partitioner based on
38	the results in the paper \cite{gk:metis}.
39
40	\begin{figure}[hbtp]
41	\centerline{ \includegraphics[scale = 0.75]{figures/mermesh.eps}}
42	\caption{The Merimbula mesh.}
43	\label{fig:mergrid}
44	\end{figure}
45
46	Figure \ref{fig:mergrid4} shows the Merimbula grid partitioned over four processors. Note that one submesh may comprise several unconnected mesh partitions. Table \ref{tbl:mer4} gives the node distribution over the four processors while Table \ref{tbl:mer8} shows the distribution over eight processors. These results imply that Pymetis gives a reasonably well balanced partition of the mesh.
47
48	\begin{figure}[hbtp]
49	\centerline{ \includegraphics[scale = 0.75]{figures/mermesh4c.eps}
50	\includegraphics[scale = 0.75]{figures/mermesh4a.eps}}
51
52
53	\centerline{ \includegraphics[scale = 0.75]{figures/mermesh4d.eps}
54	\includegraphics[scale = 0.75]{figures/mermesh4b.eps}}
55	\caption{The Merimbula grid partitioned over 4 processors using Metis.}
56	\label{fig:mergrid4}
57	\end{figure}
58
59	\begin{table}
60	\caption{Running an 4-way test of
61	{\tt run_parallel_sw_merimbula_metis.py}}\label{tbl:mer4}
62	\begin{center}
63	\begin{tabular}{c\|c c c c c c c c}
64	CPU & 0 & 1 & 2 & 3\\
65	Elements & 2757 & 2713 & 2761 & 2554\\
66	\% & 25.56\% & 25.16\% & 25.60\% & 23.68\%\
67	\end{tabular}
68	\end{center}
69	\end{table}
70
71	\begin{table}
72	\caption{Running an 8-way test of
73	{\tt run_parallel_sw_merimbula_metis.py}} \label{tbl:mer8}
74	\begin{center}
75	\begin{tabular}{c\|c c c c c c c c}
76	CPU & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\
77	Elements & 1229 & 1293 & 1352 & 1341 & 1349 & 1401 & 1413 & 1407\\
78	\% & 11.40\% & 11.99\% & 12.54\% & 12.43\% & 12.51\% & 12.99\% & 13.10\% & 13.05\%\\
79	\end{tabular}
80	\end{center}
81	\end{table}
82
83	The number of submeshes found by Pymetis is equal to the number of processors; Submesh $p$ will be assigned to Processor $p$.
84
85	\subsection {Building the Ghost Layer}\label{sec:part2}
86	The function {\tt build_submesh.py} is the work-horse and is responsible for
87	setting up the communication pattern as well as assigning the local numbering scheme for the submeshes.
88
89	Consider the example subpartitioning given in Figure \ref{fig:subdomain}. During the \code{evolve} calculations Triangle 2 in Submesh 0 will need to access its neighbour Triangle 3 stored in Submesh 1. The standard approach to this problem is to add an extra layer of triangles, which we call ghost triangles. The ghost triangles
90	are read-only, they should not be updated during the calculations, they are only there to hold any extra information that a processor may need to complete its calculations. The ghost triangle values are updated through communication calls. Figure \ref{fig:subdomaing} shows the submeshes with the extra layer of ghost triangles.
91
92	\begin{figure}[hbtp]
93	\centerline{ \includegraphics[scale = 0.6]{figures/subdomain.eps}}
94	\caption{An example subpartioning of a mesh.}
95	\label{fig:subdomain}
96	\end{figure}
97
98
99	\begin{figure}[hbtp]
100	\centerline{ \includegraphics[scale = 0.6]{figures/subdomainghost.eps}}
101	\caption{An example subpartioning with ghost triangles. The numbers in brackets shows the local numbering scheme that is calculated and stored with the mesh, but not implemented until the local mesh is built. See Section \ref{sec:part4}. }
102	\label{fig:subdomaing}
103	\end{figure}
104
105	When partitioning the mesh we introduce new, dummy, boundary edges. For example, Triangle 2 in Submesh 1 from Figure \ref{fig:subdomaing} originally shared an edge with Triangle 1, but after partitioning that edge becomes a boundary edge. These new boundary edges are are tagged as \code{ghost} and should, in general, be assigned a type of \code{None}. The following piece of code taken from {\tt run_parallel_advection.py} shows an example.
106	{\small \begin{verbatim}
107	T = Transmissive_boundary(domain)
108	domain.default_order = 2
109	domain.set_boundary( {'left': T, 'right': T, 'bottom': T, 'top': T, 'ghost': Non
110	e} )
111	\end{verbatim}}
112
113	Looking at Figure \ref{fig:subdomaing} we see that after each \code{evolve} step Processor 0 will have to send the updated values for Triangle 2 and Triangle 4 to Processor 1, and similarly Processor 1 will have to send the updated values for Triangle 3 and Triangle 5 (recall that Submesh $p$ will be assigned to Processor $p$). The \code{build_submesh} function builds a dictionary that defines the communication pattern.
114
115	Finally, the ANUGA code assumes that the triangles (and nodes etc.) are numbered consecutively starting from 0. Consequently, if Submesh 1 in Figure \ref{fig:subdomaing} was passed into the \code{evolve} calculations it would crash. The \code{build_submesh} function determines a local numbering scheme for each submesh, but it does not actually update the numbering, that is left to \code{build_local}.
116
117
118	\subsection {Sending the Submeshes}\label{sec:part3}
119
120	All of functions described so far must be run in serial on Processor 0. The next step is to start the parallel computation by spreading the submeshes over the processors. The communication is carried out by
121	\code{send_submesh} and \code{rec_submesh} defined in {\tt build_commun.py}.
122	The \code{send_submesh} function should be called on Processor 0 and sends the Submesh $p$ to Processor $p$, while \code{rec_submesh} should be called by Processor $p$ to receive Submesh $p$ from Processor 0.
123
124	As an aside, the order of communication is very important. If someone was to modify the \code{send_submesh} routine the corresponding change must be made to the \code{rec_submesh} routine.
125
126	While it is possible to get Processor 0 to communicate it's submesh to itself, it is an expensive and unnecessary communication call. The {\tt build_commun.py} file also includes a function called \code{extract_hostmesh} that should be called on Processor 0 to extract Submesh 0.
127
128
129	\subsection {Building the Local Mesh}\label{sec:part4}
130	After using \code{send_submesh} and \code{rec_submesh}, Processor $p$ should have its own local copy of Submesh $p$, however as stated previously the triangle numbering will be incorrect on all processors except number $0$. The \code{build_local_mesh} function from {\tt build_local.py} primarily focuses on renumbering the information stored with the submesh; including the nodes, vertices and quantities. Figure \ref{fig:subdomainf} shows what the mesh in each processor may look like.
131
132
133	\begin{figure}[hbtp]
134	\centerline{ \includegraphics[scale = 0.6]{figures/subdomainfinal.eps}}
135	\caption{An example subpartioning after the submeshes have been renumbered.}
136	\label{fig:subdomainf}
137	\end{figure}
138
139	Note that ghost triangles are not stored in the domain any differently to the other triangles, the only way to differentiate them is to look at the communication pattern. This means that the {\tt inundation} code is essentially the same whether it is being run in serial or parallel, the only difference is a communication call at the end of each \code{evolve} step and a check to ensure that the ghost triangles are not used in the time stepping calculations.
140
141	\section{Some Example Code}
142
143	Chapter \ref{chap:code} gives full listings of some example codes.
144
145	The first example in Section \ref{sec:codeRPA} solves the advection equation on a
146	rectangular mesh. A rectangular mesh is highly structured so a coordinate based decomposition can be use and the partitioning is simply done by calling the
147	routine \code{parallel_rectangle} as shown below.
148	\begin{verbatim}
149	#######################
150	# Partition the mesh
151	#######################
152
153	# Build a unit mesh, subdivide it over numproces processors with each
154	# submesh containing M*N nodes
155
156	points, vertices, boundary, full_send_dict, ghost_recv_dict = \
157	parallel_rectangle(N, M, len1_g=1.0)
158	\end{verbatim}
159
160	Most simulations will not be done on a rectangular mesh, and the approach to subpartitioning the mesh is different to the one described above, however this example may be of interest to those who want to measure the parallel efficiency of the code on their machine. A rectangular mesh should give a good load balance and is therefore an important first test problem.
161
162
163	A more \lq real life\rq\ mesh is the Merimbula mesh used in the code shown in Section \ref{sec:codeRPMM}. This example also solves the advection equation. In this case the techniques described in Section \ref{sec:part} must be used to partition the mesh. Figure \ref{fig:code} shows the part of the code that is responsible for spreading the mesh over the processors. We now look at the code in detail.
164
165	\begin{figure}[htbp]
166	\begin{verbatim}
167	if myid == 0:
168
169	# Read in the test files
170
171	filename = 'merimbula_10785.tsh'
172
173	mesh_full = pmesh_to_domain_instance(filename, Advection_Domain)
174	mesh_full.set_quantity('stage', Set_Stage(756000.0,756500.0,4.0))
175
176	# Subdivide the mesh
177
178	nodes, triangles, boundary, triangles_per_proc, quantities =\
179	pmesh_divide_metis(mesh_full, numprocs)
180
181	# Build the mesh that should be assigned to each processor.
182	# This includes ghost nodes and the communication pattern
183
184	submesh = build_submesh(nodes, triangles, boundary, quantities, \
185	triangles_per_proc)
186
187	# Send the mesh partition to the appropriate processor
188
189	for p in range(1, numprocs):
190	send_submesh(submesh, triangles_per_proc, p)
191
192	# Build the local mesh for processor 0
193
194	points, vertices, boundary, quantities, ghost_recv_dict, full_send_dict =\
195	extract_hostmesh(submesh, triangles_per_proc)
196
197	else:
198	# Read in the mesh partition that belongs to this
199	# processor (note that the information is in the
200	# correct form for the ANUGA data structure
201
202	points, vertices, boundary, quantities, ghost_recv_dict, full_send_dict = \
203	rec_submesh(0)
204	\end{verbatim}
205	\caption{A section of code taken from {\tt run_parallel_merimbula_metis.py} (Section \protect \ref{sec:codeRPMM}) showing how to subdivide the mesh.}
206	\label{fig:code}
207	\end{figure}
208	\newpage
209	\begin{itemize}
210	\item
211	These first few lines of code read in and define the (global) mesh. The \code{Set_Stage} function sets the initial conditions. See the code in \ref{sec:codeRPMM} for the definition of \code{Set_Stage}.
212	\begin{verbatim}
213	filename = 'merimbula_10785.tsh'
214	mesh_full = pmesh_to_domain_instance(filename, Advection_Domain)
215	mesh_full.set_quantity('stage', Set_Stage(756000.0,756500.0,4.0))
216	\end{verbatim}
217
218	\item \code{pmesh_divide_metis} divides the mesh into a set of non-overlapping subdomains as described in Section \ref{sec:part1}.
219	\begin{verbatim}
220	nodes, triangles, boundary, triangles_per_proc, quantities =\
221	pmesh_divide_metis(mesh_full, numprocs)
222	\end{verbatim}
223
224	\item The next step is to build a boundary layer of ghost triangles and define the communication pattern. This step is implemented by \code{build_submesh} as discussed in Section \ref{sec:part2}. The \code{submesh} variable contains a copy of the submesh for each processor.
225	\begin{verbatim}
226	submesh = build_submesh(nodes, triangles, boundary, quantities, \
227	triangles_per_proc)
228	\end{verbatim}
229
230	\item The actual parallel communication starts when the submesh partitions are sent to the processors by calling \code{send_submesh}.
231	\begin{verbatim}
232	for p in range(1, numprocs):
233	send_submesh(submesh, triangles_per_proc, p)
234	\end{verbatim}
235
236
237	The processors receives a given subpartition by calling \code{rec_submesh}. The \code{rec_submesh} routine also calls \code{build_local_mesh}. The \code{build_local_mesh} routine described in Section \ref{sec:part4} ensures that the information is stored in a way that is compatible with the Domain datastructure. This means, for example, that the triangles and nodes must be numbered consecutively starting from 0.
238
239	\begin{verbatim}
240	points, vertices, boundary, quantities, ghost_recv_dict, full_send_dict=\
241	rec_submesh(0)
242	\end{verbatim}
243
244	Note that the submesh is not received by, or sent to, Processor 0. Rather \code{hostmesh = extract_hostmesh(submesh)} simply extracts the mesh that has been assigned to Processor 0. Recall \code{submesh} contains the list of submeshes to be assigned to each processor. This is described further in Section \ref{sec:part3}. The \code{build_local_mesh} renumbers the nodes
245	\begin{verbatim}
246	points, vertices, boundary, quantities, ghost_recv_dict, full_send_dict)=\
247	extract_hostmesh(submesh, triangles_per_proc)
248	\end{verbatim}
249
250	\end{itemize}
251
252	\section{Running the Code}
253
254	JACK: DESCRIBE HOW TO COMPILE METIS
255	\subsection{Compilation}
256	Building the Pymetis wrapper is done using Make, but there is
257	variation depening on the host system type. Under most types of Linux,
258	simply running \verb\|make\| will work. Under x86\_64 versions of Linux,
259	the command is \verb\|make COPTIONS="-fPIC"\|. Finally, under windows,
260	the command is \verb\|make for_win32\|. For a sanity check, a simple
261	PyUnit test is provided, called test\_metis.py .

Note: See TracBrowser for help on using the repository browser.

Download in other formats: