Changeset 2767
- Timestamp:
- Apr 27, 2006, 8:40:44 AM (17 years ago)
- Location:
- inundation/parallel/documentation
- Files:
-
- 10 added
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
inundation/parallel/documentation/parallel.tex
r2723 r2767 1 1 \chapter{Running the Code in Parallel} 2 2 3 This chapter focuses on how to run the code in parallel. The first section gives an overview of the main steps used to divide the domain over the processors. The second sections gives some example code and the final section talks about how to run the code on some examplearchitectures.4 5 6 \section {Partitioning the Domain}3 This chapter looks at how to run the code in parallel. The first section gives an overview of the main steps required to divide the mesh over the processors. The second sections gives some example code and the final section talks about how to run the code on a few specific architectures. 4 5 6 \section {Partitioning the Mesh} 7 7 There are four main steps required to run the code in parallel. They are; 8 8 \begin{enumerate} 9 \item subdivide the domain into a set of non-overlapping subdomains9 \item partition the mesh into a set of non-overlapping submeshes 10 10 (\code{pmesh_divide_metis} from {\tt pmesh_divde.py}), 11 \item build a \lq ghost\rq\ or communication layer of boundarytriangles12 around each sub domainand define the communication pattern (\code{build_submesh} from {\tt build_submesh.py}),13 \item distribute the sub domains over the processors (\code{send_submesh}11 \item build a \lq ghost\rq\ or communication layer of triangles 12 around each submesh and define the communication pattern (\code{build_submesh} from {\tt build_submesh.py}), 13 \item distribute the submeshes over the processors (\code{send_submesh} 14 14 and \code{rec_submesh} from {\tt build_commun.py}), 15 \item and update the numbering scheme for the local domainassigned to a15 \item and update the numbering scheme for each submesh assigned to a 16 16 processor (\code{build_local_mesh} from {\tt build_local.py}). 17 17 \end{enumerate} … … 20 20 \begin{figure}[hbtp] 21 21 \centerline{ \includegraphics[scale = 0.75]{domain.eps}} 22 \caption{The main steps in dividing the domainover the processors.}22 \caption{The main steps used to divide the mesh over the processors.} 23 23 \label{fig:subpart} 24 24 \end{figure} 25 25 26 \subsection {Subdividing the Global Domain}27 28 The first step in parallelising the code is to subdivide the domain29 into equally sized partitions. On a rectangular domain this may be26 \subsection {Subdividing the Global Mesh} 27 28 The first step in parallelising the code is to subdivide the mesh 29 into, roughly, equally sized partitions. On a rectangular domain this may be 30 30 done by a simple co-ordinate based dissection, but on a complicated 31 domain such as the Merimbula grid shown in Figure \ref{fig: subpart}31 domain such as the Merimbula grid shown in Figure \ref{fig:mergrid} 32 32 a more sophisticated approach must be used. We use pymetis, a 33 33 python wrapper around the Metis 34 34 (\url{http://glaros.dtc.umn.edu/gkhome/metis/metis/overview}) 35 35 partitioning library. The \code{pmesh_divide_metis} function defined 36 in {\tt pmesh_divide.py} uses Metis to divide the domainfor36 in {\tt pmesh_divide.py} uses Metis to divide the mesh for 37 37 parallel computation. Metis was chosen as the partitioner based on 38 38 the results in the paper \cite{gk:metis}. … … 44 44 \end{figure} 45 45 46 Figure \ref{fig:mermesh4} shows the Merimbula grid partitioned over four processor. Table \ref{tbl:mermesh4} gives the node distribution over the four processors while Table \ref{tbl:mermesh8} shows the distribution over eight processors. These results imply that Pymetis gives a reasonably well balanced partition of the domain. 47 48 \begin{figure}[hbtp] 49 \centerline{ \includegraphics[scale = 0.75]{mermesh4.eps}} 46 Figure \ref{fig:mergrid4} shows the Merimbula grid partitioned over four processor. Table \ref{tbl:mer4} gives the node distribution over the four processors while Table \ref{tbl:mer8} shows the distribution over eight processors. These results imply that Pymetis gives a reasonably well balanced partition of the mesh. 47 48 \begin{figure}[hbtp] 49 \centerline{ \includegraphics[scale = 0.75]{mermesh4c.eps} 50 \includegraphics[scale = 0.75]{mermesh4a.eps}} 51 52 53 \centerline{ \includegraphics[scale = 0.75]{mermesh4d.eps} 54 \includegraphics[scale = 0.75]{mermesh4b.eps}} 50 55 \caption{The Merimbula grid partitioned over 4 processors using Metis.} 51 56 \label{fig:mergrid4} 52 57 \end{figure} 53 58 54 \begin{table} \label{tbl:mermesh8} 59 \begin{table} 60 \caption{Running an 4-way test of 61 {\tt run_parallel_sw_merimbula_metis.py}}\label{tbl:mer4} 62 \begin{center} 63 \begin{tabular}{c|c c c c c c c c} 64 CPU & 0 & 1 & 2 & 3\\ 65 Elements & 2757 & 2713 & 2761 & 2554\\ 66 \% & 25.56\% & 25.16\% & 25.60\% & 23.68\%\ 67 \end{tabular} 68 \end{center} 69 \end{table} 70 71 \begin{table} 55 72 \caption{Running an 8-way test of 56 {\tt run_parallel_sw_merimbula_metis.py}} 73 {\tt run_parallel_sw_merimbula_metis.py}} \label{tbl:mer8} 74 \begin{center} 57 75 \begin{tabular}{c|c c c c c c c c} 58 76 CPU & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\ 59 Elements & 12 92 & 1647 & 1505 & 1623 & 1381 & 1514 & 1605 & 1388\\60 \% & 1 0.81\% & 13.78\% & 12.59\% & 13.58\% & 11.55\% & 12.66\% & 13.43\% & 11.61\%\\77 Elements & 1229 & 1293 & 1352 & 1341 & 1349 & 1401 & 1413 & 1407\\ 78 \% & 11.40\% & 11.99\% & 12.54\% & 12.43\% & 12.51\% & 12.99\% & 13.10\% & 13.05\%\\ 61 79 \end{tabular} 80 \end{center} 62 81 \end{table} 63 82 64 The number of sub domains found by Pymetis is equal to the number of processors; Subdomain$p$ will be assigned to Processor $p$.83 The number of submeshes found by Pymetis is equal to the number of processors; Submesh $p$ will be assigned to Processor $p$. 65 84 66 85 \subsection {Building the Ghost Layer} 67 86 The function {\tt build_submesh.py} is the work-horse and is responsible for 68 setting up the communication pattern as well deciding the local numbering scheme for the subdomains.69 70 Consider the example subpartitioning given in Figure \ref{fig:subdomain}. During the evolve calculations Triangle 3 in Subdomain 0 will need to access its neighbour Triangle 4 stored in Subdomain1. The standard approach to this problem is to add an extra layer of triangles, which we call ghost triangles. The ghost triangles71 are read-only, they should not be updated during the calculations, they are only there to hold any extra information that a processor may need to complete its calculations. The ghost triangle values are updated through communication calls. Figure \ref{fig:subdomaing} shows the sub domains with the extra layer of ghost triangles.87 setting up the communication pattern as well as assigning the local numbering scheme for the submeshes. 88 89 Consider the example subpartitioning given in Figure \ref{fig:subdomain}. During the \code{evolve} calculations Triangle 3 in Submesh 0 will need to access its neighbour Triangle 4 stored in Submesh 1. The standard approach to this problem is to add an extra layer of triangles, which we call ghost triangles. The ghost triangles 90 are read-only, they should not be updated during the calculations, they are only there to hold any extra information that a processor may need to complete its calculations. The ghost triangle values are updated through communication calls. Figure \ref{fig:subdomaing} shows the submeshes with the extra layer of ghost triangles. 72 91 73 92 \begin{figure}[hbtp] … … 84 103 \end{figure} 85 104 86 Looking at Figure \ref{fig:subdomaing} we see that after each evolve step Processor 0 will have to send the updated values for Triangle 3 and Triangle 5 to Processor 1, and similarly Processor 1 will have to send the updated values for triangles 4, 7 and 6 (recall that Subdomain $p$ will be assigned to Processor $p$). The \code{build_submesh} function builds a dictionary that defines the communication pattern. 87 88 Finally, the ANUGA code assumes that the triangles (and nodes etc.) are numbered consecutively starting from 1. Consequently, if the subdomain defined in Processor 1 in Figure \ref{fig:subdomaing} is passed into the evolve calculations it would crash. The \code{build_submesh} function determines a local numbering scheme for each subdomain, but it does not actually update the numbering, that is left to \code{build_local}. 89 90 \subsection {Sending the Subdomains} 91 92 All of functions described so far must be run in serial on Processor 0, the next step is to start the parallel computation by spreading the subdomains over the processors. The communication is carried out by 105 When partitioning the mesh we introduce new, dummy, boundary edges. For example, Triangle 3 in Submesh 1, from Figure \ref{fig:subdomaing}, originally shared an edge with Triangle 2, but after partitioning that edge becomes a boundary edge. These new boundary edges are are tagged as \code{ghost} and should, in general, be assigned a type of \code{None}. The following piece of code taken from {\tt run_parallel_advection.py} shows an example. 106 107 {\small \begin{verbatim} 108 T = Transmissive_boundary(domain) 109 domain.default_order = 2 110 domain.set_boundary( {'left': T, 'right': T, 'bottom': T, 'top': T, 'ghost': Non 111 e} ) 112 \end{verbatim}} 113 114 115 Looking at Figure \ref{fig:subdomaing} we see that after each \code{evolve} step Processor 0 will have to send the updated values for Triangle 3 and Triangle 5 to Processor 1, and similarly Processor 1 will have to send the updated values for triangles 4, 7 and 6 (recall that Submesh $p$ will be assigned to Processor $p$). The \code{build_submesh} function builds a dictionary that defines the communication pattern. 116 117 Finally, the ANUGA code assumes that the triangles (and nodes etc.) are numbered consecutively starting from 1. Consequently, if Submesh 1 in Figure \ref{fig:subdomaing} was passed into the \code{evolve} calculations it would crash. The \code{build_submesh} function determines a local numbering scheme for each submesh, but it does not actually update the numbering, that is left to \code{build_local}. 118 119 \subsection {Sending the Submeshes} 120 121 All of functions described so far must be run in serial on Processor 0, the next step is to start the parallel computation by spreading the submeshes over the processors. The communication is carried out by 93 122 \code{send_submesh} and \code{rec_submesh} defined in {\tt build_commun.py}. 94 The \code{send_submesh} function should be called on Processor 0 and sends the Sub domain $p$ to Processor $p$, while \code{rec_submesh} should be called by Processor $p$ to receive Subdomain$p$ from Processor 0. Note that the order of communication is very important, if any changes are made to the \code{send_submesh} function the corresponding change must be made to the \code{rec_submesh} function.95 96 While it is possible to get Processor 0 to communicate it's sub domain to itself, it is an expensive unnessary communication call. The {\tt build_commun.py} file also includes a function called \code{extract_hostmesh} which simply extracts Subdomain $0$.97 98 99 \subsection {Building the Local Domain}100 After using \code{send_submesh} and \code{rec_submesh}, Processor $p$ should have its own local copy of Sub domain $p$, however as stated previously the triangle numbering may be incorrect. The \code{build_local_mesh} function from {\tt build_local.py} primarily focuses on renumbering the information stored with the subdomain; including the nodes, vertices and quantities. Figure \ref{fig:subdomainf} shows what the domainin each processor may look like.123 The \code{send_submesh} function should be called on Processor 0 and sends the Submesh $p$ to Processor $p$, while \code{rec_submesh} should be called by Processor $p$ to receive Submesh $p$ from Processor 0. Note that the order of communication is very important, if any changes are made to the \code{send_submesh} function the corresponding change must be made to the \code{rec_submesh} function. 124 125 While it is possible to get Processor 0 to communicate it's submesh to itself, it is an expensive and unnecessary communication call. The {\tt build_commun.py} file also includes a function called \code{extract_hostmesh} that should be called on Processor 0 to extract Submesh 0. 126 127 128 \subsection {Building the Local Mesh} 129 After using \code{send_submesh} and \code{rec_submesh}, Processor $p$ should have its own local copy of Submesh $p$, however as stated previously the triangle numbering may be incorrect. The \code{build_local_mesh} function from {\tt build_local.py} primarily focuses on renumbering the information stored with the submesh; including the nodes, vertices and quantities. Figure \ref{fig:subdomainf} shows what the mesh in each processor may look like. 101 130 102 131 103 132 \begin{figure}[hbtp] 104 133 \centerline{ \includegraphics[scale = 0.6]{subdomainfinal.eps}} 105 \caption{An example subpartioning after the domain has been renumbered 106 and defined on each processor.} 134 \caption{An example subpartioning after the submeshes have been renumbered.} 107 135 \label{fig:subdomainf} 108 136 \end{figure} 109 137 110 Note that ghost triangles are not stored any differently to the other triangles, the only way to differentiate them is to look at the communication pattern. This means that the {\tt innundation} code is essentially the same whether it is being run in serial or parallel, the only difference is a communication call at the end of each evolvestep and a check to ensure that the ghost triangles are not used in the time stepping calculations.138 Note that ghost triangles are not stored in the domain any differently to the other triangles, the only way to differentiate them is to look at the communication pattern. This means that the {\tt inundation} code is essentially the same whether it is being run in serial or parallel, the only difference is a communication call at the end of each \code{evolve} step and a check to ensure that the ghost triangles are not used in the time stepping calculations. 111 139 112 140 \section{Some Example Code} 141 142 \begin{figure} 143 \begin{verbatim} 144 if myid == 0: 145 146 # Read in the test files 147 148 filename = 'merimbula_10785_1.tsh' 149 150 # Build the whole domain 151 152 domain_full = pmesh_to_domain_instance(filename, Domain) 153 154 # Define the domain boundaries for visualisation 155 156 rect = array(domain_full.xy_extent, Float) 157 158 # Initialise the wave 159 160 domain_full.set_quantity('stage', Set_Stage(756000.0,756500.0,2.0)) 161 162 # Subdivide the mesh 163 164 nodes, triangles, boundary, triangles_per_proc, quantities = \ 165 pmesh_divide_metis(domain_full, numprocs) 166 167 # Build the mesh that should be assigned to each processor, 168 # this includes ghost nodes and the communicaiton pattern 169 170 submesh = build_submesh(nodes, triangles, boundary,\ 171 quantities, triangles_per_proc) 172 173 # Send the mesh partition to the appropriate processor 174 175 for p in range(1, numprocs): 176 send_submesh(submesh, triangles_per_proc, p) 177 178 # Build the local mesh for processor 0 179 180 hostmesh = extract_hostmesh(submesh) 181 points, vertices, boundary, quantities, ghost_recv_dict, full_send_dict = \ 182 build_local_mesh(hostmesh, 0, triangles_per_proc[0], numprocs) 183 184 else: 185 186 # Read in the mesh partition that belongs to this 187 # processor (note that the information is in the 188 # correct form for the GA data structure) 189 190 points, vertices, boundary, quantities, ghost_recv_dict, full_send_dict \ 191 = rec_submesh(0) 192 \end{verbatim} 193 \end{figure} 113 194 114 195 \section{Running the Code} -
inundation/parallel/documentation/report.tex
r2723 r2767 64 64 \include{results} 65 65 \include{visualisation} 66 \include{code} 66 67 67 68 \begin{thebibliography}{10}
Note: See TracChangeset
for help on using the changeset viewer.