1 | % Use the standard \LaTeXe\ article style in 12pt Computer Modern font |
---|
2 | % on A4 paper by |
---|
3 | \documentclass[12pt,a4paper]{article} |
---|
4 | % Do \emph{not} change the width nor the height of the text from the |
---|
5 | % defaults set by this document class. |
---|
6 | % |
---|
7 | % The alternative which is closer to what we actually use is |
---|
8 | % \documentclass[11pt,a5paper]{article} |
---|
9 | % \usepackage[a5paper]{geometry} |
---|
10 | % Because it is a great size for on screen reading |
---|
11 | % and prints nicely on a4paper either 2up or booklet. |
---|
12 | |
---|
13 | % The preamble is to contain your own \LaTeX\ commands and to say |
---|
14 | % what packages to use. Three useful packages are the following: |
---|
15 | \usepackage{graphicx} % avoid epsfig or earlier such packages |
---|
16 | \usepackage{url} % for URLs and DOIs |
---|
17 | \newcommand{\doi}[1]{\url{http://dx.doi.org/#1}} |
---|
18 | \usepackage{amsmath} % many want amsmath extensions |
---|
19 | \usepackage{amsfonts} |
---|
20 | \usepackage{underscore} |
---|
21 | % Avoid loading unused packages (as done by some \LaTeX\ editors). |
---|
22 | |
---|
23 | % Create title and authors using \verb|\maketitle|. Separate authors by |
---|
24 | % \verb|\and| and put addresses in \verb|\thanks| command with |
---|
25 | % \verb|\url| command \verb|\protect|ed. |
---|
26 | \title{Parallelisation of a |
---|
27 | finite volume method for hydrodynamic inundation modelling} |
---|
28 | |
---|
29 | |
---|
30 | \author{ |
---|
31 | S.~G.~Roberts\thanks{Dept. of Maths, Australian National University, |
---|
32 | Canberra, \textsc{Australia}. \protect\url{mailto:[stephen.roberts, |
---|
33 | linda.stals]@anu.edu.au}} \and L.~Stals\footnotemark[1] \and |
---|
34 | O.~M.~Nielsen\thanks{Risk Assessment Methods Project, Geospatial and |
---|
35 | Earth Monitoring Division, Geoscience Australia, Symonston, |
---|
36 | \textsc{Australia}. \protect\url{mailto:Ole.Nielsen@ga.gov.au}} } |
---|
37 | |
---|
38 | \date{24 August 2006} |
---|
39 | |
---|
40 | \newcommand{\AnuGA}{\textsc{anuga}} |
---|
41 | \newcommand{\Python}{\textsc{python}} |
---|
42 | \newcommand{\VPython}{\textsc{vpython}} |
---|
43 | \newcommand{\pypar}{\textsc{mpi}} |
---|
44 | \newcommand{\Metis}{\textsc{metis}} |
---|
45 | \newcommand{\mpi}{\textsc{mpi}} |
---|
46 | |
---|
47 | \newcommand{\UU}{\mathbf{U}} |
---|
48 | \newcommand{\VV}{\mathbf{V}} |
---|
49 | \newcommand{\EE}{\mathbf{E}} |
---|
50 | \newcommand{\GG}{\mathbf{G}} |
---|
51 | \newcommand{\FF}{\mathbf{F}} |
---|
52 | \newcommand{\HH}{\mathbf{H}} |
---|
53 | \newcommand{\SSS}{\mathbf{S}} |
---|
54 | \newcommand{\nn}{\mathbf{n}} |
---|
55 | |
---|
56 | \newcommand{\code}[1]{\texttt{#1}} |
---|
57 | |
---|
58 | |
---|
59 | |
---|
60 | \begin{document} |
---|
61 | |
---|
62 | % Use default \verb|\maketitle|. |
---|
63 | \maketitle |
---|
64 | |
---|
65 | % Use the \verb|abstract| environment. |
---|
66 | \begin{abstract} |
---|
67 | Geoscience Australia and the Australian National University are |
---|
68 | developing a hydrodynamic inundation modelling tool called \AnuGA{} |
---|
69 | to help simulate the impact of natural inundation hazards such as |
---|
70 | riverine flooding, storm surges and tsunamis. The core of \AnuGA{} |
---|
71 | is a \Python{} implementation of a finite volume method, based on |
---|
72 | triangular meshes, for solving the conservative form of the Shallow |
---|
73 | Water Wave equation. We describe the parallelisation of the code |
---|
74 | using a domain decomposition strategy where we use the \Metis{} |
---|
75 | graph partitioning library to decompose the finite volume meshes. |
---|
76 | The parallel efficiency of our code is tested using a number of mesh |
---|
77 | partitions, and we verify that the \Metis{} graph partition is |
---|
78 | particularly efficient. |
---|
79 | \end{abstract} |
---|
80 | |
---|
81 | % By default we include a table of contents in each paper. |
---|
82 | \tableofcontents |
---|
83 | |
---|
84 | % Use \verb|\section|, \verb|\subsection|, \verb|\subsubsection| and |
---|
85 | % possibly \verb|\paragraph| to structure your document. Make sure |
---|
86 | % you \verb|\label| them for cross referencing with \verb|\ref|\,. |
---|
87 | \section{Introduction} |
---|
88 | \label{sec:intro} |
---|
89 | |
---|
90 | %Floods are the single greatest cause of death due to natural hazards |
---|
91 | %in Australia, causing almost 40{\%} of the fatalities recorded |
---|
92 | %between 1788 and 2003~\cite{Blong-2005}. Analysis of buildings |
---|
93 | %damaged between 1900 and 2003 suggests that 93.6{\%} of damage is |
---|
94 | %the result of meteorological hazards, of which almost 25{\%} is |
---|
95 | %directly attributable to flooding~\cite{Blong-2005}. |
---|
96 | |
---|
97 | %Flooding of coastal communities may result from surges of near shore |
---|
98 | %waters caused by severe storms. The extent of inundation is |
---|
99 | %critically linked to tidal conditions, bathymetry and topography; as |
---|
100 | %recently exemplified in the United States by Hurricane Katrina. |
---|
101 | %While events of this scale are rare, the extensive development of |
---|
102 | %Australia's coastal corridor means that storm surge inundation of |
---|
103 | %even a few hundred metres beyond the shoreline has the potential to |
---|
104 | %cause significant disruption and loss. |
---|
105 | % |
---|
106 | %Coastal communities also face the small but real risk of tsunami. |
---|
107 | %Fortunately, catastrophic tsunami of the scale of the 26 December |
---|
108 | %2004 event are exceedingly rare. However, smaller scale tsunami are |
---|
109 | %more common and regularly threaten coastal communities around the |
---|
110 | %world. Earthquakes which occur in the Java Trench near Indonesia |
---|
111 | %(e.g.~\cite{TsuMIS1995}) and along the Puysegur Ridge to the south |
---|
112 | %of New Zealand (e.g.~\cite{LebKC1998}) have potential to generate |
---|
113 | %tsunami that may threaten Australia's northwestern and southeastern |
---|
114 | %coastlines. This was particularly evident in the aftermath of the |
---|
115 | %Java tsunami on the 17th July 2006 where Western Australia did |
---|
116 | %experience a localised tsunami run up. |
---|
117 | |
---|
118 | Hydrodynamic modelling allows flooding, storm surge and tsunami |
---|
119 | hazards to be better understood, their impacts to be anticipated |
---|
120 | and, with appropriate planning, their effects to be mitigated. |
---|
121 | Geoscience Australia in collaboration with the Mathematical Sciences |
---|
122 | Institute, Australian National University, is developing a software |
---|
123 | application called \AnuGA{} to model the hydrodynamics of floods, |
---|
124 | storm surges and tsunamis. These hazards are modelled using the |
---|
125 | conservative shallow water equations which are described in |
---|
126 | section~\ref{sec:model}. In \AnuGA{} the solution of these equations |
---|
127 | are approximated using a finite volume method based on triangular |
---|
128 | meshes, as described in section~\ref{sec:fvm}. Nielsen \textit{et |
---|
129 | al.}~\cite{modsim2005} provides a more complete discussion of the |
---|
130 | method and also provides a validation of the model and method on a |
---|
131 | standard tsunami benchmark data set. Section~\ref{sec:parallel} |
---|
132 | provides a description of the parallelisation of the code using a |
---|
133 | domain decomposition strategy and in section~\ref{sec:analysis} |
---|
134 | preliminary timing results which verify the scalability of the |
---|
135 | parallel code are presented. |
---|
136 | |
---|
137 | |
---|
138 | |
---|
139 | \section{Model} |
---|
140 | \label{sec:model} |
---|
141 | |
---|
142 | The shallow water wave equations are a system of differential |
---|
143 | conservation equations which describe the flow of a thin layer of |
---|
144 | fluid over terrain. The form of the equations are |
---|
145 | \[ |
---|
146 | \frac{\partial \UU}{\partial t}+\frac{\partial \EE}{\partial |
---|
147 | x}+\frac{\partial \GG}{\partial y}=\SSS \,, |
---|
148 | \] |
---|
149 | where $\UU=\left[ {{\begin{array}{*{20}c} |
---|
150 | h & {uh} & {vh} \\ |
---|
151 | \end{array} }} \right]^T$ is the vector of conserved quantities; water depth |
---|
152 | $h$, $x$ horizontal momentum $uh$ and $y$ horizontal momentum $vh$. |
---|
153 | Other quantities entering the system are bed elevation $z$ and stage |
---|
154 | $w$ (absolute water level), where these variables are related via $w |
---|
155 | = z + h$. The horizontal fluxes in the $x$ and $y$ directions, $\EE$ |
---|
156 | and $\GG$ are |
---|
157 | \[ |
---|
158 | \EE=\left[ {{\begin{array}{*{20}c} |
---|
159 | {uh} \hfill \\ |
---|
160 | {u^2h+gh^2/2} \hfill \\ |
---|
161 | {uvh} \hfill \\ |
---|
162 | \end{array} }} \right] \quad \mbox{ and }\quad \GG=\left[ {{\begin{array}{*{20}c} |
---|
163 | {vh} \hfill \\ |
---|
164 | {vuh} \hfill \\ |
---|
165 | {v^2h+gh^2/2} \hfill \\ |
---|
166 | \end{array} }} \right] |
---|
167 | \] |
---|
168 | and the source term (which includes gravity and friction) is |
---|
169 | \[ |
---|
170 | \SSS=\left[ {{\begin{array}{*{20}c} |
---|
171 | 0 \hfill \\ |
---|
172 | -{gh(z_{x} + S_{fx} )} \hfill \\ |
---|
173 | -{gh(z_{y} + S_{fy} )} \hfill \\ |
---|
174 | \end{array} }} \right] \,, |
---|
175 | \] |
---|
176 | where $S_f$ is the bed friction. We model the friction term using |
---|
177 | Manning's resistance law |
---|
178 | \[ |
---|
179 | S_{fx} =\frac{u\eta ^2\sqrt {u^2+v^2} }{h^{4/3}}\quad |
---|
180 | \mbox{and}\quad S_{fy} =\frac{v\eta ^2\sqrt {u^2+v^2} }{h^{4/3}} \,, |
---|
181 | \] |
---|
182 | in which $\eta$ is the Manning resistance coefficient. |
---|
183 | |
---|
184 | As demonstrated in our papers, Zoppou and Roberts~\cite{Rob99l}, and |
---|
185 | Nielsen \textit{et al.}~\cite{modsim2005}, these equations provide |
---|
186 | an excellent model of flows associated with inundation such as dam |
---|
187 | breaks and tsunamis. |
---|
188 | |
---|
189 | \section{Finite volume method} |
---|
190 | \label{sec:fvm} |
---|
191 | |
---|
192 | We use a finite volume method for approximating the solution of the |
---|
193 | shallow water wave equations \cite{Rob99l}. The study area is |
---|
194 | represented by a mesh of triangular cells as in |
---|
195 | Figure~\ref{fig:mesh:reconstruct}, in which the conserved quantities |
---|
196 | $h$, $uh$, and $vh$, in each cell are to be determined. The size of |
---|
197 | the triangular cells are varied within the mesh to allow greater |
---|
198 | resolution in regions of particular interest. |
---|
199 | |
---|
200 | %\begin{figure} |
---|
201 | %\begin{center} |
---|
202 | %\includegraphics[width=5.0cm,keepaspectratio=true]{step-five} |
---|
203 | %\caption{Our finite volume method uses triangular meshes. Conserved |
---|
204 | %quantities $h$, $uh$ and $vh$ are associated with the centroid of |
---|
205 | %each triangular cell.} \label{fig:mesh} |
---|
206 | %\end{center} |
---|
207 | %\end{figure} |
---|
208 | |
---|
209 | \begin{figure} |
---|
210 | \begin{center} |
---|
211 | \includegraphics[width=5.0cm,keepaspectratio=true]{step-reconstruct} |
---|
212 | \caption{Conserved quantities $h$, $uh$ and $vh$ are associated with |
---|
213 | the centroid of each triangular cell. From the values of the |
---|
214 | conserved quantities at the centroid of the cell and its |
---|
215 | neighbouring cells, a discontinuous piecewise linear reconstruction |
---|
216 | of the conserved quantities is obtained.} |
---|
217 | \label{fig:mesh:reconstruct} |
---|
218 | \end{center} |
---|
219 | \end{figure} |
---|
220 | |
---|
221 | The equations constituting the finite volume method are obtained by |
---|
222 | integrating the differential conservation equations over each |
---|
223 | triangular cell of the mesh. |
---|
224 | |
---|
225 | Introducing some notation; we use $i$ to refer to the index of the |
---|
226 | $i$th triangular cell $T_i$, and ${\cal N}(i)$ to the set of indices |
---|
227 | referring to the cells neighbouring the $i$th cell. The area of the |
---|
228 | $i$th triangular cell is $A_i$ and $l_{ij}$ is the length of the |
---|
229 | edge between the $i$th and $j$th cells. |
---|
230 | |
---|
231 | By applying the divergence theorem we obtain for each cell, an |
---|
232 | equation which describes the rate of change of the average of the |
---|
233 | conserved quantities within each cell, in terms of the fluxes across |
---|
234 | the edges of the cell and the effect of the source term. In |
---|
235 | particular, for each cell |
---|
236 | $$ |
---|
237 | \frac{d\UU_i }{dt}+ \frac1{A_i}\sum\limits_{j\in{\cal N}(i)} \HH_{ij} l_{ij} = |
---|
238 | \SSS_i \,, |
---|
239 | $$ |
---|
240 | where |
---|
241 | %\begin{itemize} |
---|
242 | %\item $\UU_i$ is the vector of conserved quantities averaged over the $i$th cell, |
---|
243 | %\item $\SSS_i$ is the source term associated with the $i$th cell, |
---|
244 | %and |
---|
245 | %\item $\HH_{ij}$ is the outward normal flux of |
---|
246 | %material across the \textit{ij}-th edge. |
---|
247 | %\end{itemize} |
---|
248 | % |
---|
249 | $\UU_i$ is the vector of conserved quantities averaged over the |
---|
250 | $i$th cell, $\SSS_i$~is the source term associated with the~$i$th |
---|
251 | cell and $\HH_{ij}$~is the outward normal flux of material across |
---|
252 | the \textit{ij}-th edge. |
---|
253 | |
---|
254 | |
---|
255 | %\item $l_{ij}$ is the length of the edge between the $i$th and $j$th |
---|
256 | %cells |
---|
257 | %\item $m_{ij}$ is the midpoint of |
---|
258 | %the \textit{ij}th edge, |
---|
259 | %\item |
---|
260 | %$\mathbf{n}_{ij} = (n_{ij,1} , n_{ij,2})$is the outward pointing |
---|
261 | %normal along the \textit{ij}th edge, and The |
---|
262 | |
---|
263 | The flux $\HH_{ij}$ is evaluated using a numerical flux function |
---|
264 | $\HH(\cdot, \cdot ; \ \cdot)$ which is consistent with the shallow |
---|
265 | water flux in the sense that for all conserved quantity vectors |
---|
266 | $\UU$ and spatial vectors $\nn$ |
---|
267 | $$ |
---|
268 | H(\UU,\UU;\-\nn) = \EE(\UU) n_1 + \GG(\UU) n_2 \,. |
---|
269 | $$ |
---|
270 | Then the flux across the \textit{ij}th edge is |
---|
271 | $$ |
---|
272 | \HH_{ij} = \HH(\bar{\UU}_i(m_{ij}), \bar{\UU}_j(m_{ij});\- |
---|
273 | \mathbf{n}_{ij}) \,, |
---|
274 | $$ |
---|
275 | where $m_{ij}$ is the midpoint of the \textit{ij}th edge and |
---|
276 | $\mathbf{n}_{ij}$ is the outward pointing normal of the |
---|
277 | \textit{ij}th edge. The function $\bar{\UU}_i(x,y)$ for $(x,y) \in |
---|
278 | T_i$ is obtained from the average conserved quantity values, |
---|
279 | $\UU_k$, for $k \in \{ i \} \cup {\cal N}(i)$ (\textit{i}th cell and |
---|
280 | its neighbours). We use a second order reconstruction to produce a |
---|
281 | piece wise linear function reconstruction, $\bar{\UU}_i(x,y)$, of |
---|
282 | the conserved quantities for all $(x,y) \in T_i$ for each cell (see |
---|
283 | Figure~\ref{fig:mesh:reconstruct}). This function is allowed to be |
---|
284 | discontinuous across the edges of the cells, but the slope of this |
---|
285 | function is limited to avoid artificially introduced oscillations. |
---|
286 | |
---|
287 | The numerical flux function $\HH(\cdot, \cdot ; \ \cdot)$ is |
---|
288 | obtained by rotating the coordinate system so the outward normal is |
---|
289 | in the $x$ direction. This then reduces the normal flux calculation |
---|
290 | to a one dimensional flux calculation. The central upwind scheme as |
---|
291 | described by Kurganov \textit{et al.}~\cite{KurNP2001} is then used |
---|
292 | to approximate the resulting fluxes of the one dimension problem. |
---|
293 | |
---|
294 | |
---|
295 | |
---|
296 | |
---|
297 | |
---|
298 | |
---|
299 | |
---|
300 | |
---|
301 | In the computations presented we use an explicit Euler time stepping |
---|
302 | method with variable time stepping adapted to the Courant Friedrichs |
---|
303 | Levy (CFL) condition number. |
---|
304 | |
---|
305 | |
---|
306 | |
---|
307 | |
---|
308 | |
---|
309 | |
---|
310 | |
---|
311 | %\section{Validation} |
---|
312 | %\label{sec:validation} The process of validating the \AnuGA{} |
---|
313 | %application is in its early stages, however initial indications are |
---|
314 | %encouraging. |
---|
315 | % |
---|
316 | %As part of the Third International Workshop on Long wave Runup |
---|
317 | %Models in 2004 (\url{http://www.cee.cornell.edu/longwave}), four |
---|
318 | %benchmark problems were specified to allow the comparison of |
---|
319 | %numerical, analytical and physical models with laboratory and field |
---|
320 | %data. One of these problems describes a wave tank simulation of the |
---|
321 | %1993 Okushiri Island tsunami off Hokkaido, Japan \cite{MatH2001}. A |
---|
322 | %significant feature of this tsunami was a maximum run up of 32~m |
---|
323 | %observed at the head of the Monai Valley. This run up was not |
---|
324 | %uniform along the coast and is thought to have resulted from a |
---|
325 | %particular topographic effect. Among other features, simulations of |
---|
326 | %the Hokkaido tsunami should capture this run up phenomenon. |
---|
327 | % |
---|
328 | %\begin{figure}[htbp] |
---|
329 | %\centerline{\includegraphics[width=4in]{tsunami-fig-3.eps}} |
---|
330 | %\caption{Comparison of wave tank and \AnuGA{} water stages at gauge |
---|
331 | %5.}\label{fig:val} |
---|
332 | %\end{figure} |
---|
333 | % |
---|
334 | % |
---|
335 | %\begin{figure}[htbp] |
---|
336 | %\centerline{\includegraphics[width=4in]{tsunami-fig-4.eps}} |
---|
337 | %\caption{Complex reflection patterns and run up into Monai Valley |
---|
338 | %simulated by \AnuGA{} and visualised using our netcdf OSG |
---|
339 | %viewer.}\label{fig:run} |
---|
340 | %\end{figure} |
---|
341 | % |
---|
342 | % |
---|
343 | %The wave tank simulation of the Hokkaido tsunami was used as the |
---|
344 | %first scenario for validating \AnuGA{}. The dataset provided |
---|
345 | %bathymetry and topography along with initial water depth and the |
---|
346 | %wave specifications. The dataset also contained water depth time |
---|
347 | %series from three wave gauges situated offshore from the simulated |
---|
348 | %inundation area. |
---|
349 | % |
---|
350 | %Figure~\ref{fig:val} compares the observed wave tank and modelled |
---|
351 | %\AnuGA{} water depth (stage height) at one of the gauges. The plots |
---|
352 | %show good agreement between the two time series, with \AnuGA{} |
---|
353 | %closely modelling the initial draw down, the wave shoulder and the |
---|
354 | %subsequent reflections. The discrepancy between modelled and |
---|
355 | %simulated data in the first 10 seconds is due to the initial |
---|
356 | %condition in the physical tank not being uniformly zero. Similarly |
---|
357 | %good comparisons are evident with data from the other two gauges. |
---|
358 | %Additionally, \AnuGA{} replicates exceptionally well the 32~m Monai |
---|
359 | %Valley run up, and demonstrates its occurrence to be due to the |
---|
360 | %interaction of the tsunami wave with two juxtaposed valleys above |
---|
361 | %the coastline. The run up is depicted in Figure~\ref{fig:run}. |
---|
362 | % |
---|
363 | %This successful replication of the tsunami wave tank simulation on a |
---|
364 | %complex 3D beach is a positive first step in validating the \AnuGA{} |
---|
365 | %modelling capability. Subsequent validation will be conducted as |
---|
366 | %additional datasets become available. |
---|
367 | |
---|
368 | |
---|
369 | |
---|
370 | \section{Parallel implementation}\label{sec:parallel} |
---|
371 | |
---|
372 | To parallelise our algorithm we use a domain decomposition strategy. |
---|
373 | We start with a global mesh which defines the domain of our problem. |
---|
374 | We must first subdivide the global mesh into a set of submeshes. |
---|
375 | This partitioning is done using the \Metis{} partitioning library, |
---|
376 | see Karypis~\cite{metis}. We demonstrate the efficiency of this |
---|
377 | library in the following subsections. Once this partitioning has |
---|
378 | been done, a layer of ``ghost'' cells is constructed and the |
---|
379 | corresponding communication patterns are set. Then we start to |
---|
380 | evolve our solution. The computation progresses independently on |
---|
381 | each submesh, until the end of the time step when the appropriate |
---|
382 | information is communicated among processing elements. |
---|
383 | |
---|
384 | %\begin{figure}[hbtp] |
---|
385 | % \centerline{ \includegraphics[scale = 0.6]{domain.eps}} |
---|
386 | % \caption{The first step of the parallel algorithm is to divide |
---|
387 | % the mesh over the processors.} |
---|
388 | % \label{fig:subpart} |
---|
389 | %\end{figure} |
---|
390 | |
---|
391 | |
---|
392 | |
---|
393 | \begin{figure}[h!] |
---|
394 | \centerline{ \includegraphics[width=6.5cm]{mermesh.eps}} |
---|
395 | \caption{The global Lake Merimbula mesh} |
---|
396 | \label{fig:mergrid} |
---|
397 | \end{figure} |
---|
398 | |
---|
399 | \begin{figure}[h!] |
---|
400 | \centerline{ \includegraphics[width=6.5cm]{mermesh4c.eps} |
---|
401 | \includegraphics[width=6.5cm]{mermesh4a.eps}} |
---|
402 | |
---|
403 | \centerline{ \includegraphics[width=6.5cm]{mermesh4d.eps} |
---|
404 | \includegraphics[width=6.5cm]{mermesh4b.eps}} |
---|
405 | \caption{The global Lake Merimula mesh from Figure~\ref{fig:mergrid}, partitioned into 4 submeshes using \Metis.} |
---|
406 | \label{fig:mergrid4} |
---|
407 | \end{figure} |
---|
408 | |
---|
409 | |
---|
410 | |
---|
411 | |
---|
412 | \begin{table}[h!] |
---|
413 | \caption{4-way and 8-way partition tests of Merimbula mesh showing |
---|
414 | the percentage distribution of cells between the |
---|
415 | submeshes.}\label{tbl:mer} |
---|
416 | \begin{center} |
---|
417 | \begin{tabular}{|c|c c c c|} |
---|
418 | \hline |
---|
419 | CPU & 0 & 1 & 2 & 3\\ |
---|
420 | \hline |
---|
421 | Cells & 2757 & 2713 & 2761 & 2554\\ |
---|
422 | \% & 25.6\% & 25.2\% & 25.6\% & 23.7\%\ \\ |
---|
423 | \hline |
---|
424 | \end{tabular} |
---|
425 | |
---|
426 | \medskip |
---|
427 | |
---|
428 | \begin{tabular}{|c|c c c c c c c c|} |
---|
429 | \hline |
---|
430 | CPU & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\ |
---|
431 | \hline |
---|
432 | Cells & 1229 & 1293 & 1352 & 1341 & 1349 & 1401 & 1413 & 1407\\ |
---|
433 | \% & 11.4\% & 12.0\% & 12.5\% & 12.4\% & 12.5\% & 13.0\% & 13.1\% & 13.1\%\\ |
---|
434 | \hline |
---|
435 | \end{tabular} |
---|
436 | \end{center} |
---|
437 | \end{table} |
---|
438 | |
---|
439 | \subsection {Subdividing the global mesh}\label{sec:part1} |
---|
440 | |
---|
441 | The first step in parallelising the code is to subdivide the mesh |
---|
442 | into, roughly, equally sized partitions to ensure good load |
---|
443 | balancing. On a rectangular mesh this may be done by a simple |
---|
444 | coordinate based dissection method, but on a complicated domain such |
---|
445 | as the mesh shown in Figure~\ref{fig:mergrid}, a more sophisticated |
---|
446 | approach must be used. We use the \Metis{} partitioning library |
---|
447 | based on Karypis and Kumar's~\cite{gk:metis} recommendations. |
---|
448 | Figure~\ref{fig:mergrid4} shows the original global grid partitioned |
---|
449 | over four submeshes. Note that one submesh may comprise several |
---|
450 | unconnected mesh partitions and Table~\ref{tbl:mer} gives the node |
---|
451 | distribution over four submeshes and eight submeshes. These results |
---|
452 | imply that \Metis{} gives a reasonably well balanced partition of |
---|
453 | the mesh. |
---|
454 | |
---|
455 | \begin{figure}[thp] |
---|
456 | \centerline{ \includegraphics[scale = 0.55]{subdomainfinal.eps}} |
---|
457 | \caption{An example subpartitioning of a global mesh into two submeshes, showing the |
---|
458 | ghost nodes used in the two submeshes. |
---|
459 | } |
---|
460 | \label{fig:subdomainf} |
---|
461 | \end{figure} |
---|
462 | |
---|
463 | |
---|
464 | \paragraph{Ghost cells} |
---|
465 | |
---|
466 | %\begin{figure}[h!] |
---|
467 | % \centerline{ \includegraphics[height=8cm]{subdomain.eps}} |
---|
468 | % \caption{An example subpartitioning of a global mesh into two submeshes.} |
---|
469 | % \label{fig:subdomain} |
---|
470 | %\end{figure} |
---|
471 | % |
---|
472 | % |
---|
473 | %\begin{figure}[b!] |
---|
474 | % \centerline{ \includegraphics[height=8cm]{subdomainghost.eps}} |
---|
475 | % \caption{An example subpartitioning with ghost cells. The numbers |
---|
476 | %in brackets shows the local numbering scheme that is calculated and |
---|
477 | %stored with the mesh, but not implemented until the local mesh is |
---|
478 | %built.} |
---|
479 | % \label{fig:subdomaing} |
---|
480 | %\end{figure} |
---|
481 | |
---|
482 | |
---|
483 | |
---|
484 | |
---|
485 | Consider the example subpartitioning given in |
---|
486 | Figure~\ref{fig:subdomainf}. The top mesh represents the global |
---|
487 | mesh, whereas the two lower meshes display the partitioning of the |
---|
488 | global mesh (together with extra ghost cells to facilitate |
---|
489 | communication) onto two processors. |
---|
490 | |
---|
491 | As an example, during the evolution calculations, cell~2 (residing |
---|
492 | on processor~0) will need to access information from its' global |
---|
493 | neighbour, cell~3 (residing on processor~1). A standard approach to |
---|
494 | this problem is to add an extra layer of cells, which we call ghost |
---|
495 | cells, to each submesh, on each processor. The ghost cells are used |
---|
496 | to hold information that an ordinary cell in a submesh needs to |
---|
497 | complete its calculations. The values associated with the ghost |
---|
498 | cells need to be updated through communication calls usually only |
---|
499 | once every time step (at least for first order time stepping). Such |
---|
500 | communication patterns are determined and recorded before sub |
---|
501 | partitioning the mesh into submeshes. |
---|
502 | |
---|
503 | % |
---|
504 | % |
---|
505 | %Referring to Figure~\ref{fig:subdomainf} we see that during each |
---|
506 | %communication call submesh~0 will have to send the updated values |
---|
507 | %for global cell~2 and cell~4 to submesh~1, and similarly submesh~1 |
---|
508 | %will have to send the updated values for global cell~3 and cell~5 to |
---|
509 | %submesh~0. Each processor records it own communication pattern, i.e. |
---|
510 | %the expected pattern of data it expects to receive from the other |
---|
511 | %processors so as to update the ghost cell data, and also the pattern |
---|
512 | %of data sends used to send data to other processors to update their |
---|
513 | %ghost cells. |
---|
514 | |
---|
515 | The ghost cells in each submesh are treated exactly the same as any |
---|
516 | other cell, the only way to differentiate them is to look at the |
---|
517 | communication pattern. This means that the code is essentially the |
---|
518 | same whether it is being run in serial or parallel, the only |
---|
519 | difference is a communication call at the end of each time step and |
---|
520 | an extra \texttt{if} statement in the local calculation of the time |
---|
521 | step constraint to ensure that the ghost cells are not used in that |
---|
522 | calculation. |
---|
523 | |
---|
524 | \section{Performance analysis}\label{sec:analysis} |
---|
525 | |
---|
526 | |
---|
527 | To evaluate the performance of the code on a parallel machine we ran |
---|
528 | some examples on a cluster of four nodes connected with PathScale |
---|
529 | InfiniPath \textsc{htx}. Each node has two \textsc{amd} Opteron 275 |
---|
530 | (Dual core 2.2 GHz Processors) and 4 GB of main memory. The system |
---|
531 | achieves 60 Gigaflops with the Linpack benchmark, which is about |
---|
532 | 85\% of peak performance. For each test run we evaluate the parallel |
---|
533 | efficiency as |
---|
534 | \[ |
---|
535 | E_p = \frac{T_1}{pT_p} 100 \,, |
---|
536 | \] |
---|
537 | where $T_p = \max_{0\le i < p}\{t_i\}$, $p$ is the total number of |
---|
538 | processors and $t_i$ is the time required to run the evolution code |
---|
539 | on processor $i$. Note that $t_i$ only includes the time required |
---|
540 | to do the finite volume evolution calculations, not the time |
---|
541 | required to build and subpartition the mesh. |
---|
542 | |
---|
543 | \subsection{Advection, rectangular mesh} |
---|
544 | |
---|
545 | The first example solves an advection equation on a rectangular mesh |
---|
546 | of size $n$ by $m$. Table~\ref{tbl:rpa4080160} show the efficiency |
---|
547 | results for different values of $n$ and $m$. The examples where $p |
---|
548 | \le 4$ were run on one Opteron node containing four processors, the |
---|
549 | $p = 8$ example was run on two nodes (giving a total of eight |
---|
550 | processors). The communication within a node is faster than the |
---|
551 | communication across nodes, so we would expect to see a decrease in |
---|
552 | efficiency when we jump from four to eight processors. On the other |
---|
553 | hand, the efficiency is likely to increase with $n$ and $m$, due to |
---|
554 | an increased ratio between interior and exterior triangles and hence |
---|
555 | an increased ratio of computation to communication. The results |
---|
556 | displayed in Table~\ref{tbl:rpa4080160} verify this, expect perhaps |
---|
557 | for the slightly higher than expected efficiency of the $p=2$, |
---|
558 | $n=80$, $m=80$ case. Generally the efficiency results shown here are |
---|
559 | consistent and competitive. |
---|
560 | |
---|
561 | %\begin{table}[h!] |
---|
562 | %\caption{Parallel Efficiency Results for the Advection Problem on a |
---|
563 | %Rectangular Domain with {\tt N} = 40, {\tt M} = |
---|
564 | %40.\label{tbl:rpa40}} |
---|
565 | %\begin{center} |
---|
566 | %\begin{tabular}{|c|c c|}\hline |
---|
567 | %$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline |
---|
568 | %1 & 36.61 & \\ |
---|
569 | %2 & 18.76 & 98 \\ |
---|
570 | %4 & 10.16 & 90 \\ |
---|
571 | %8 & 6.39 & 72 \\\hline |
---|
572 | %\end{tabular} |
---|
573 | %\end{center} |
---|
574 | %\end{table} |
---|
575 | % |
---|
576 | %\begin{table}[h!] |
---|
577 | %\caption{Parallel Efficiency Results for the Advection Problem on a |
---|
578 | %Rectangular Domain with {\tt N} = 80, {\tt M} = |
---|
579 | %80.\label{tbl:rpa80}} |
---|
580 | %\begin{center} |
---|
581 | %\begin{tabular}{|c|c c|}\hline |
---|
582 | %$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline |
---|
583 | %1 & 282.18 & \\ |
---|
584 | %2 & 143.14 & 99 \\ |
---|
585 | %4 & 75.06 & 94 \\ |
---|
586 | %8 & 41.67 & 85 \\\hline |
---|
587 | %\end{tabular} |
---|
588 | %\end{center} |
---|
589 | %\end{table} |
---|
590 | % |
---|
591 | % |
---|
592 | %\begin{table}[h!] |
---|
593 | %\caption{Parallel Efficiency Results for the Advection Problem on a |
---|
594 | %Rectangular Domain with {\tt N} = 160, {\tt M} = |
---|
595 | %160.\label{tbl:rpa160}} |
---|
596 | %\begin{center} |
---|
597 | %\begin{tabular}{|c|c c|}\hline |
---|
598 | %$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline |
---|
599 | %1 & 2200.35 & \\ |
---|
600 | %2 & 1126.35 & 97 \\ |
---|
601 | %4 & 569.49 & 97 \\ |
---|
602 | %8 & 304.43 & 90 \\\hline |
---|
603 | %\end{tabular} |
---|
604 | %\end{center} |
---|
605 | %\end{table} |
---|
606 | |
---|
607 | |
---|
608 | \begin{table}[h!] |
---|
609 | \caption{Parallel efficiency results for the advection problem on |
---|
610 | $n$ by $m$ rectangular meshes using $p$ |
---|
611 | processors.\label{tbl:rpa4080160}} |
---|
612 | \begin{center} |
---|
613 | \begin{tabular}{|c|c c| c c | c c |}\hline |
---|
614 | & \multicolumn{2}{c|}{40 by 40 mesh} & \multicolumn{2}{|c|}{80 by 80 mesh} |
---|
615 | & \multicolumn{2}{|c|}{160 by 160 mesh} \\ \hline |
---|
616 | $p$ & $T_p$ (sec) & $E_p (\%)$ & $T_p$ (sec) & $E_p (\%)$ & $T_p$ |
---|
617 | (sec) & $E_p (\%)$ |
---|
618 | \\\hline |
---|
619 | 1 & 36.6 & &282.& & 2200. & \\ |
---|
620 | 2 & 18.8 & 98 & 143. & 99 & 1126. & 98 \\ |
---|
621 | 4 & 10.2 & 90 & 75.1 & 94 & 569. & 97 \\ |
---|
622 | 8 & 6.39 & 72 &41.7 & 85 & 304. & 90 \\\hline |
---|
623 | \end{tabular} |
---|
624 | \end{center} |
---|
625 | \end{table} |
---|
626 | |
---|
627 | %\begin{table}[h!] |
---|
628 | %\caption{Parallel Efficiency Results for the Advection Problem on a |
---|
629 | %Rectangular Domain with {\tt N} = 80, {\tt M} = |
---|
630 | %80.\label{tbl:rpa80}} |
---|
631 | %\begin{center} |
---|
632 | %\begin{tabular}{|c|c c|}\hline |
---|
633 | %$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline |
---|
634 | %1 & 282.18 & \\ |
---|
635 | %2 & 143.14 & 99 \\ |
---|
636 | %4 & 75.06 & 94 \\ |
---|
637 | %8 & 41.67 & 85 \\\hline |
---|
638 | %\end{tabular} |
---|
639 | %\end{center} |
---|
640 | %\end{table} |
---|
641 | % |
---|
642 | % |
---|
643 | %\begin{table}[h!] |
---|
644 | %\caption{Parallel Efficiency Results for the Advection Problem on a |
---|
645 | %Rectangular Domain with {\tt N} = 160, {\tt M} = |
---|
646 | %160.\label{tbl:rpa160}} |
---|
647 | %\begin{center} |
---|
648 | %\begin{tabular}{|c|c c|}\hline |
---|
649 | %$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline |
---|
650 | %1 & 2200.35 & \\ |
---|
651 | %2 & 1126.35 & 97 \\ |
---|
652 | %4 & 569.49 & 97 \\ |
---|
653 | %8 & 304.43 & 90 \\\hline |
---|
654 | %\end{tabular} |
---|
655 | %\end{center} |
---|
656 | %\end{table} |
---|
657 | |
---|
658 | |
---|
659 | \subsection{Advection, Lake Merimbula mesh} |
---|
660 | |
---|
661 | We now look at another advection example, except this time the mesh |
---|
662 | comes from a study of water flow in Lake Merimbula, New South Wales. |
---|
663 | The mesh is shown in Figure~\ref{fig:mergrid}. The results are given |
---|
664 | in Table \ref{tbl:rpm}. These are good efficiency results, |
---|
665 | especially considering the structure of this mesh. |
---|
666 | %Note that since we are solving an advection problem the amount of calculation |
---|
667 | %done on each triangle is relatively low, when we more to other problems that |
---|
668 | %involve more calculations we would expect the computation to communication |
---|
669 | %ratio to increase and thus get an increase in efficiency. |
---|
670 | |
---|
671 | %\begin{table} |
---|
672 | %\caption{Parallel Efficiency Results for the Advection Problem on |
---|
673 | %the Lake Merimbula Mesh.\label{tbl:rpm}} |
---|
674 | %\begin{center} |
---|
675 | %\begin{tabular}{|c|c c|}\hline |
---|
676 | %$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline |
---|
677 | %1 &145.17 & \\ |
---|
678 | %2 &77.52 & 94 \\ |
---|
679 | %4 & 41.24 & 88 \\ |
---|
680 | %8 & 22.96 & 79 \\\hline |
---|
681 | %\end{tabular} |
---|
682 | %\end{center} |
---|
683 | %\end{table} |
---|
684 | % |
---|
685 | %\begin{table} |
---|
686 | %\caption{Parallel Efficiency Results for the Shallow Water Equation |
---|
687 | %on the |
---|
688 | % Lake Merimbula Mesh.\label{tbl:rpsm}} |
---|
689 | %\begin{center} |
---|
690 | %\begin{tabular}{|c|c c|}\hline |
---|
691 | %$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline |
---|
692 | %1 & 7.04 & \\ |
---|
693 | %2 & 3.62 & 97 \\ |
---|
694 | %4 & 1.94 & 91 \\ |
---|
695 | %8 & 1.15 & 77 \\\hline |
---|
696 | %\end{tabular} |
---|
697 | %\end{center} |
---|
698 | %\end{table} |
---|
699 | |
---|
700 | |
---|
701 | \begin{table} |
---|
702 | \caption{Parallel efficiency results for the advection equation and |
---|
703 | the shallow water equation on the Lake Merimbula mesh for $p$ |
---|
704 | processors.\label{tbl:rpm}} |
---|
705 | \begin{center} |
---|
706 | \begin{tabular}{|c|c c| c c |}\hline |
---|
707 | & \multicolumn{2}{c|}{Advection} & \multicolumn{2}{|c|}{Shallow water |
---|
708 | eqn} \\ \hline |
---|
709 | $p$ & $T_p$ (sec) & $E_p (\%)$ & $T_p$ (sec) & $E_p (\%)$\\\hline |
---|
710 | 1 &145.0 & & 7.04 & \\ |
---|
711 | 2 &77.5 & 94 & 3.62 & 97 \\ |
---|
712 | 4 & 41.2 & 88 & 1.94 & 91 \\ |
---|
713 | 8 & 23.0 & 79 & 1.15 & 76 \\\hline |
---|
714 | \end{tabular} |
---|
715 | \end{center} |
---|
716 | \end{table} |
---|
717 | |
---|
718 | %\begin{table} |
---|
719 | %\caption{Parallel Efficiency Results for the Shallow Water Equation |
---|
720 | %on the |
---|
721 | % Lake Merimbula Mesh.\label{tbl:rpsm}} |
---|
722 | %\begin{center} |
---|
723 | %\begin{tabular}{|c|c c|}\hline |
---|
724 | %$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline |
---|
725 | %1 & 7.04 & \\ |
---|
726 | %2 & 3.62 & 97 \\ |
---|
727 | %4 & 1.94 & 91 \\ |
---|
728 | %8 & 1.15 & 77 \\\hline |
---|
729 | %\end{tabular} |
---|
730 | %\end{center} |
---|
731 | %\end{table} |
---|
732 | |
---|
733 | |
---|
734 | \subsection{Shallow water, Lake Merimbula mesh} |
---|
735 | |
---|
736 | The final example we look at is the shallow water equation on the |
---|
737 | Lake Merimbula mesh. The results for this case are also listed in |
---|
738 | Table \ref{tbl:rpm}. The efficiency results for two and four |
---|
739 | processors is again good. For eight processors the efficiency falls |
---|
740 | off rather quickly. |
---|
741 | |
---|
742 | On profiling the code we found that the loss of efficiency is due to |
---|
743 | the boundary update routine. To allow maximum flexibility in |
---|
744 | experimenting with different boundary conditions, the boundary |
---|
745 | routines are written in \Python (as opposed to most of the other |
---|
746 | computationally intensive kernels which are written in \texttt{C}). |
---|
747 | When running the code on one processor the boundary routine accounts |
---|
748 | for about 72\% of the total computation time. The \Metis{} |
---|
749 | subpartition of the mesh produced an imbalance in the number of |
---|
750 | active boundary edges in each subpartition. The profiler indicated |
---|
751 | that when running the problem on eight processors, Processor 0 spent |
---|
752 | about 3.8 times more time on the boundary calculation than Processor |
---|
753 | 7, indicating about 3.8 times as many active boundary edges. This |
---|
754 | load imbalance reduced the parallel efficiency. In the future the |
---|
755 | boundary routine will be rewritten in \texttt{C} to reduce its |
---|
756 | overall contribution to the computation and so reduce the effect of |
---|
757 | this active boundary imbalance. |
---|
758 | |
---|
759 | |
---|
760 | |
---|
761 | |
---|
762 | |
---|
763 | |
---|
764 | \section{Conclusions} |
---|
765 | \label{sec:6} \AnuGA{} is a flexible and robust modelling system |
---|
766 | that simulates hydrodynamics by solving the shallow water wave |
---|
767 | equation in a triangular mesh. It models the process of wetting and |
---|
768 | drying as water enters and leaves an area and is capable of |
---|
769 | capturing hydraulic shocks due to the ability of the finite volume |
---|
770 | method to accommodate discontinuities in the solution. It has been |
---|
771 | used to simulate the behaviour of hydrodynamic natural hazards such |
---|
772 | as riverine flooding, storm surge and tsunami. |
---|
773 | |
---|
774 | The use of the parallel code will enhance the modelling capability |
---|
775 | of \AnuGA{} and will form part of Geoscience Australia's ongoing |
---|
776 | research effort to model and understand the potential impact from |
---|
777 | natural hazards in order to reduce their impact on Australian |
---|
778 | communities. |
---|
779 | |
---|
780 | |
---|
781 | %\paragraph{Acknowledgements:} |
---|
782 | %The authors are grateful to Belinda Barnes, National Centre for |
---|
783 | %Epidemiology and Population Health, Australian National University, |
---|
784 | %and Matt Hayne and Augusto Sanabria, Risk Research Group, Geoscience |
---|
785 | %Australia, for helpful reviews of a previous version of this paper. |
---|
786 | %Author Nielsen publish with the permission of the CEO, Geoscience |
---|
787 | %Australia. |
---|
788 | |
---|
789 | % Preferably provide your bibliography as a separate .bbl file. |
---|
790 | % Include URLs, DOIs, Math Review numbers or Zentralblatt numbers in your bibliography |
---|
791 | % so we help connect the web of science and ensure maximum visibility |
---|
792 | % for your article. |
---|
793 | |
---|
794 | |
---|
795 | \bibliographystyle{plain} |
---|
796 | \bibliography{database1} |
---|
797 | |
---|
798 | |
---|
799 | |
---|
800 | |
---|
801 | \end{document} |
---|