Changeset 2767


Ignore:
Timestamp:
Apr 27, 2006, 8:40:44 AM (17 years ago)
Author:
linda
Message:

Added some example code to the parallel report

Location:
inundation/parallel/documentation
Files:
10 added
2 edited

Legend:

Unmodified
Added
Removed
  • inundation/parallel/documentation/parallel.tex

    r2723 r2767  
    11\chapter{Running the Code in Parallel}
    22
    3 This chapter focuses on how to run the code in parallel. The first section gives an overview of the main steps used to divide the domain over the processors. The second sections gives some example code and the final section talks about how to run the code on some example architectures.
    4 
    5 
    6 \section {Partitioning the Domain}
     3This chapter looks at how to run the code in parallel. The first section gives an overview of the main steps required to divide the mesh over the processors. The second sections gives some example code and the final section talks about how to run the code on a few specific architectures.
     4
     5
     6\section {Partitioning the Mesh}
    77There are four main steps required to run the code in parallel. They are;
    88\begin{enumerate}
    9 \item subdivide the domain into a set of non-overlapping subdomains
     9\item partition the mesh into a set of non-overlapping submeshes
    1010(\code{pmesh_divide_metis} from {\tt pmesh_divde.py}),
    11 \item build a \lq ghost\rq\ or communication layer of boundary triangles
    12 around each subdomain and define the communication pattern (\code{build_submesh} from {\tt build_submesh.py}),
    13 \item distribute the subdomains over the processors (\code{send_submesh}
     11\item build a \lq ghost\rq\ or communication layer of triangles
     12around each submesh and define the communication pattern (\code{build_submesh} from {\tt build_submesh.py}),
     13\item distribute the submeshes over the processors (\code{send_submesh}
    1414and \code{rec_submesh} from {\tt build_commun.py}),
    15 \item and update the numbering scheme for the local domain assigned to a
     15\item and update the numbering scheme for each submesh assigned to a
    1616processor (\code{build_local_mesh} from {\tt build_local.py}).
    1717\end{enumerate}
     
    2020\begin{figure}[hbtp]
    2121  \centerline{ \includegraphics[scale = 0.75]{domain.eps}}
    22   \caption{The main steps in dividing the domain over the processors.}
     22  \caption{The main steps used to divide the mesh over the processors.}
    2323  \label{fig:subpart}
    2424\end{figure}
    2525
    26 \subsection {Subdividing the Global Domain}
    27 
    28 The first step in parallelising the code is to subdivide the domain
    29 into equally sized partitions. On a rectangular domain this may be
     26\subsection {Subdividing the Global Mesh}
     27
     28The first step in parallelising the code is to subdivide the mesh
     29into, roughly, equally sized partitions. On a rectangular domain this may be
    3030done by a simple co-ordinate based dissection, but on a complicated
    31 domain such as the Merimbula grid shown in Figure \ref{fig:subpart}
     31domain such as the Merimbula grid shown in Figure \ref{fig:mergrid}
    3232a more sophisticated approach must be used.  We use pymetis, a
    3333python wrapper around the Metis
    3434(\url{http://glaros.dtc.umn.edu/gkhome/metis/metis/overview})
    3535partitioning library. The \code{pmesh_divide_metis} function defined
    36 in {\tt pmesh_divide.py} uses Metis to divide the domain for
     36in {\tt pmesh_divide.py} uses Metis to divide the mesh for
    3737parallel computation. Metis was chosen as the partitioner based on
    3838the results in the paper \cite{gk:metis}.
     
    4444\end{figure}
    4545
    46 Figure \ref{fig:mermesh4} shows the Merimbula grid partitioned over four processor. Table \ref{tbl:mermesh4} gives the node distribution over the four processors while Table \ref{tbl:mermesh8} shows the distribution over eight processors. These results imply that Pymetis gives a reasonably well balanced partition of the domain.
    47 
    48 \begin{figure}[hbtp]
    49   \centerline{ \includegraphics[scale = 0.75]{mermesh4.eps}}
     46Figure \ref{fig:mergrid4} shows the Merimbula grid partitioned over four processor. Table \ref{tbl:mer4} gives the node distribution over the four processors while Table \ref{tbl:mer8} shows the distribution over eight processors. These results imply that Pymetis gives a reasonably well balanced partition of the mesh.
     47
     48\begin{figure}[hbtp]
     49  \centerline{ \includegraphics[scale = 0.75]{mermesh4c.eps}
     50  \includegraphics[scale = 0.75]{mermesh4a.eps}}
     51
     52
     53  \centerline{ \includegraphics[scale = 0.75]{mermesh4d.eps}
     54  \includegraphics[scale = 0.75]{mermesh4b.eps}}
    5055  \caption{The Merimbula grid partitioned over 4 processors using Metis.}
    5156 \label{fig:mergrid4}
    5257\end{figure}
    5358
    54 \begin{table} \label{tbl:mermesh8}
     59\begin{table}
     60\caption{Running an 4-way test of
     61{\tt run_parallel_sw_merimbula_metis.py}}\label{tbl:mer4}
     62\begin{center}
     63\begin{tabular}{c|c c c c c c c c}
     64CPU & 0 & 1 & 2 & 3\\
     65Elements & 2757 & 2713 & 2761 & 2554\\
     66\% & 25.56\% & 25.16\% & 25.60\% & 23.68\%\
     67\end{tabular}
     68\end{center}
     69\end{table}
     70
     71\begin{table}
    5572\caption{Running an 8-way test of
    56 {\tt run_parallel_sw_merimbula_metis.py}}
     73{\tt run_parallel_sw_merimbula_metis.py}} \label{tbl:mer8}
     74\begin{center}
    5775\begin{tabular}{c|c c c c c c c c}
    5876CPU & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\
    59 Elements & 1292 & 1647 & 1505 & 1623 & 1381 & 1514 & 1605 & 1388\\
    60 \% & 10.81\% & 13.78\% & 12.59\% & 13.58\% & 11.55\% & 12.66\% & 13.43\% & 11.61\%\\
     77Elements & 1229 & 1293 & 1352 & 1341 & 1349 & 1401 & 1413 & 1407\\
     78\% & 11.40\% & 11.99\% & 12.54\% & 12.43\% & 12.51\% & 12.99\% & 13.10\% & 13.05\%\\
    6179\end{tabular}
     80\end{center}
    6281\end{table}
    6382
    64 The number of subdomains found by Pymetis is equal to the number of processors; Subdomain $p$ will be assigned to Processor $p$.
     83The number of submeshes found by Pymetis is equal to the number of processors; Submesh $p$ will be assigned to Processor $p$.
    6584
    6685\subsection {Building the Ghost Layer}
    6786The function {\tt build_submesh.py} is the work-horse and is responsible for
    68 setting up the communication pattern as well deciding the local numbering scheme for the subdomains.
    69 
    70 Consider the example subpartitioning given in Figure \ref{fig:subdomain}. During the evolve calculations Triangle 3 in Subdomain 0 will need to access its neighbour Triangle 4 stored in Subdomain 1. The standard approach to this problem is to add an extra layer of triangles, which we call ghost triangles. The ghost triangles
    71 are read-only, they should not be updated during the calculations, they are only there to hold any extra information that a processor may need to complete its calculations. The ghost triangle values are updated through communication calls. Figure \ref{fig:subdomaing} shows the subdomains with the extra layer of ghost triangles.
     87setting up the communication pattern as well as assigning the local numbering scheme for the submeshes.
     88
     89Consider the example subpartitioning given in Figure \ref{fig:subdomain}. During the \code{evolve} calculations Triangle 3 in Submesh 0 will need to access its neighbour Triangle 4 stored in Submesh 1. The standard approach to this problem is to add an extra layer of triangles, which we call ghost triangles. The ghost triangles
     90are read-only, they should not be updated during the calculations, they are only there to hold any extra information that a processor may need to complete its calculations. The ghost triangle values are updated through communication calls. Figure \ref{fig:subdomaing} shows the submeshes with the extra layer of ghost triangles.
    7291
    7392\begin{figure}[hbtp]
     
    84103\end{figure}
    85104
    86 Looking at Figure \ref{fig:subdomaing} we see that after each evolve step Processor 0  will have to send the updated values for Triangle 3 and Triangle 5 to Processor 1, and similarly Processor 1 will have to send the updated values for triangles 4, 7 and 6 (recall that Subdomain $p$ will be assigned to Processor $p$). The \code{build_submesh} function builds a dictionary that defines the communication pattern.
    87 
    88 Finally, the ANUGA code assumes that the triangles (and nodes etc.) are numbered consecutively starting from 1. Consequently, if the subdomain defined in Processor 1 in Figure \ref{fig:subdomaing} is passed into the evolve calculations it would crash. The \code{build_submesh} function determines a local numbering scheme for each subdomain, but it does not actually update the numbering, that is left to \code{build_local}.
    89 
    90 \subsection {Sending the Subdomains}
    91 
    92 All of functions described so far must be run in serial on Processor 0, the next step is to start the parallel computation by spreading the subdomains over the processors. The communication is carried out by
     105When partitioning the mesh we introduce new, dummy, boundary edges. For example, Triangle 3 in Submesh 1, from Figure \ref{fig:subdomaing}, originally shared an edge with Triangle 2, but after partitioning that edge becomes a boundary edge. These new boundary edges are are tagged as \code{ghost} and should, in general, be assigned a type of \code{None}. The following piece of code taken from {\tt run_parallel_advection.py} shows an example. 
     106 
     107{\small \begin{verbatim} 
     108T = Transmissive_boundary(domain)
     109domain.default_order = 2
     110domain.set_boundary( {'left': T, 'right': T, 'bottom': T, 'top': T, 'ghost': Non
     111e} )
     112\end{verbatim}}
     113
     114
     115Looking at Figure \ref{fig:subdomaing} we see that after each \code{evolve} step Processor 0  will have to send the updated values for Triangle 3 and Triangle 5 to Processor 1, and similarly Processor 1 will have to send the updated values for triangles 4, 7 and 6 (recall that Submesh $p$ will be assigned to Processor $p$). The \code{build_submesh} function builds a dictionary that defines the communication pattern.
     116
     117Finally, the ANUGA code assumes that the triangles (and nodes etc.) are numbered consecutively starting from 1. Consequently, if Submesh 1 in Figure \ref{fig:subdomaing} was passed into the \code{evolve} calculations it would crash. The \code{build_submesh} function determines a local numbering scheme for each submesh, but it does not actually update the numbering, that is left to \code{build_local}.
     118
     119\subsection {Sending the Submeshes}
     120
     121All of functions described so far must be run in serial on Processor 0, the next step is to start the parallel computation by spreading the submeshes over the processors. The communication is carried out by
    93122\code{send_submesh} and \code{rec_submesh} defined in {\tt build_commun.py}.
    94 The \code{send_submesh} function should be called on Processor 0 and sends the Subdomain $p$ to Processor $p$, while \code{rec_submesh} should be called by Processor $p$ to receive Subdomain $p$ from Processor 0. Note that the order of communication is very important, if any changes are made to the \code{send_submesh} function the corresponding change must be made to the \code{rec_submesh} function.
    95 
    96 While it is possible to get Processor 0 to communicate it's subdomain to itself, it is an expensive unnessary communication call. The {\tt build_commun.py} file also includes a function called \code{extract_hostmesh} which simply extracts Subdomain $0$.
    97 
    98 
    99 \subsection {Building the Local Domain}
    100 After using \code{send_submesh} and \code{rec_submesh}, Processor $p$ should have its own local copy of Subdomain $p$, however as stated previously the triangle numbering may be incorrect. The \code{build_local_mesh} function from {\tt build_local.py} primarily focuses on renumbering the information stored with the subdomain; including the nodes, vertices and quantities. Figure \ref{fig:subdomainf} shows what the domain in each processor may look like.
     123The \code{send_submesh} function should be called on Processor 0 and sends the Submesh $p$ to Processor $p$, while \code{rec_submesh} should be called by Processor $p$ to receive Submesh $p$ from Processor 0. Note that the order of communication is very important, if any changes are made to the \code{send_submesh} function the corresponding change must be made to the \code{rec_submesh} function.
     124
     125While it is possible to get Processor 0 to communicate it's submesh to itself, it is an expensive and unnecessary communication call. The {\tt build_commun.py} file also includes a function called \code{extract_hostmesh} that should be called on Processor 0 to extract Submesh 0.
     126
     127
     128\subsection {Building the Local Mesh}
     129After using \code{send_submesh} and \code{rec_submesh}, Processor $p$ should have its own local copy of Submesh $p$, however as stated previously the triangle numbering may be incorrect. The \code{build_local_mesh} function from {\tt build_local.py} primarily focuses on renumbering the information stored with the submesh; including the nodes, vertices and quantities. Figure \ref{fig:subdomainf} shows what the mesh in each processor may look like.
    101130
    102131
    103132\begin{figure}[hbtp]
    104133  \centerline{ \includegraphics[scale = 0.6]{subdomainfinal.eps}}
    105   \caption{An example subpartioning after the domain has been renumbered
    106 and defined on each processor.}
     134  \caption{An example subpartioning after the submeshes have been renumbered.}
    107135 \label{fig:subdomainf}
    108136\end{figure}
    109137
    110 Note that ghost triangles are not stored any differently to the other triangles, the only way to differentiate them is to look at the communication pattern. This means that the {\tt innundation} code is essentially the same whether it is being run in serial or parallel, the only difference is a communication call at the end of each evolve step and a check to ensure that the ghost triangles are not used in the time stepping calculations.
     138Note that ghost triangles are not stored in the domain any differently to the other triangles, the only way to differentiate them is to look at the communication pattern. This means that the {\tt inundation} code is essentially the same whether it is being run in serial or parallel, the only difference is a communication call at the end of each \code{evolve} step and a check to ensure that the ghost triangles are not used in the time stepping calculations.
    111139
    112140\section{Some Example Code}
     141
     142\begin{figure}
     143\begin{verbatim}
     144if myid == 0:
     145
     146    # Read in the test files
     147
     148    filename = 'merimbula_10785_1.tsh'
     149
     150    # Build the whole domain
     151   
     152    domain_full = pmesh_to_domain_instance(filename, Domain)
     153
     154    # Define the domain boundaries for visualisation
     155
     156    rect = array(domain_full.xy_extent, Float)
     157
     158    # Initialise the wave
     159
     160    domain_full.set_quantity('stage', Set_Stage(756000.0,756500.0,2.0))
     161
     162    # Subdivide the mesh
     163   
     164    nodes, triangles, boundary, triangles_per_proc, quantities = \
     165         pmesh_divide_metis(domain_full, numprocs)
     166
     167    # Build the mesh that should be assigned to each processor,
     168    # this includes ghost nodes and the communicaiton pattern
     169   
     170    submesh = build_submesh(nodes, triangles, boundary,\
     171                            quantities, triangles_per_proc)
     172
     173    # Send the mesh partition to the appropriate processor
     174
     175    for p in range(1, numprocs):
     176      send_submesh(submesh, triangles_per_proc, p)
     177
     178    # Build the local mesh for processor 0
     179
     180    hostmesh = extract_hostmesh(submesh)
     181    points, vertices, boundary, quantities, ghost_recv_dict, full_send_dict = \
     182             build_local_mesh(hostmesh, 0, triangles_per_proc[0], numprocs)
     183
     184else:
     185   
     186    # Read in the mesh partition that belongs to this
     187    # processor (note that the information is in the
     188    # correct form for the GA data structure)
     189
     190    points, vertices, boundary, quantities, ghost_recv_dict, full_send_dict \
     191            = rec_submesh(0)
     192\end{verbatim}
     193\end{figure}
    113194
    114195\section{Running the Code}
  • inundation/parallel/documentation/report.tex

    r2723 r2767  
    6464\include{results}
    6565\include{visualisation}
     66\include{code}
    6667
    6768\begin{thebibliography}{10}
Note: See TracChangeset for help on using the changeset viewer.