source: anuga_work/publications/ctac_2006/paper-ctac.tex @ 4950

Last change on this file since 4950 was 4950, checked in by steve, 16 years ago

Files to create ctac2006 paper available at ANZIAM journal http://anziamj.austms.org.au/ojs/index.php/ANZIAMJ/article/view/124/196

File size: 29.7 KB
Line 
1% Use the standard \LaTeXe\ article style in 12pt Computer Modern font
2% on A4 paper by
3\documentclass[12pt,a4paper]{article}
4% Do \emph{not} change the width nor the height of the text from the
5% defaults set by this document class.
6%
7% The alternative which is closer to what we actually use is
8% \documentclass[11pt,a5paper]{article}
9% \usepackage[a5paper]{geometry}
10% Because it is a great size for on screen reading
11% and prints nicely on a4paper either 2up or booklet.
12
13% The preamble is to contain your own \LaTeX\ commands and to say
14% what packages to use.  Three useful packages are the following:
15\usepackage{graphicx} % avoid epsfig or earlier such packages
16\usepackage{url}      % for URLs and DOIs
17\newcommand{\doi}[1]{\url{http://dx.doi.org/#1}}
18\usepackage{amsmath}  % many want amsmath extensions
19\usepackage{amsfonts}
20\usepackage{underscore}
21% Avoid loading unused packages (as done by some \LaTeX\ editors).
22
23% Create title and authors using \verb|\maketitle|.  Separate authors by
24% \verb|\and| and put addresses in \verb|\thanks| command with
25% \verb|\url| command \verb|\protect|ed.
26\title{Parallelisation of a
27finite volume method for hydrodynamic inundation modelling}
28
29
30\author{
31S.~G.~Roberts\thanks{Dept. of Maths, Australian National University,
32Canberra, \textsc{Australia}. \protect\url{mailto:[stephen.roberts,
33linda.stals]@anu.edu.au}} \and L.~Stals\footnotemark[1] \and
34O.~M.~Nielsen\thanks{Risk Assessment Methods Project, Geospatial and
35Earth Monitoring Division, Geoscience Australia, Symonston,
36\textsc{Australia}. \protect\url{mailto:Ole.Nielsen@ga.gov.au}} }
37
38\date{24 August 2006}
39
40\newcommand{\AnuGA}{\textsc{anuga}}
41\newcommand{\Python}{\textsc{python}}
42\newcommand{\VPython}{\textsc{vpython}}
43\newcommand{\pypar}{\textsc{mpi}}
44\newcommand{\Metis}{\textsc{metis}}
45\newcommand{\mpi}{\textsc{mpi}}
46
47\newcommand{\UU}{\mathbf{U}}
48\newcommand{\VV}{\mathbf{V}}
49\newcommand{\EE}{\mathbf{E}}
50\newcommand{\GG}{\mathbf{G}}
51\newcommand{\FF}{\mathbf{F}}
52\newcommand{\HH}{\mathbf{H}}
53\newcommand{\SSS}{\mathbf{S}}
54\newcommand{\nn}{\mathbf{n}}
55
56\newcommand{\code}[1]{\texttt{#1}}
57
58
59
60\begin{document}
61
62% Use default \verb|\maketitle|.
63\maketitle
64
65% Use the \verb|abstract| environment.
66\begin{abstract}
67Geoscience Australia and the Australian National University are
68developing a hydrodynamic inundation modelling tool called \AnuGA{}
69to help simulate the impact of natural inundation hazards such as
70riverine flooding, storm surges and tsunamis. The core of \AnuGA{}
71is a \Python{} implementation of a finite volume method, based on
72triangular meshes, for solving the conservative form of the Shallow
73Water Wave equation. We describe the parallelisation of the code
74using a domain decomposition strategy where we use the \Metis{}
75graph partitioning library to decompose the finite volume meshes.
76The parallel efficiency of our code is tested using a number of mesh
77partitions, and we verify that the \Metis{} graph partition is
78particularly efficient.
79\end{abstract}
80
81% By default we include a table of contents in each paper.
82\tableofcontents
83
84% Use \verb|\section|, \verb|\subsection|, \verb|\subsubsection| and
85% possibly \verb|\paragraph| to structure your document.  Make sure
86% you \verb|\label| them for cross referencing with \verb|\ref|\,.
87\section{Introduction}
88\label{sec:intro}
89
90%Floods are the single greatest cause of death due to natural hazards
91%in Australia, causing almost 40{\%} of the fatalities recorded
92%between 1788 and 2003~\cite{Blong-2005}. Analysis of buildings
93%damaged between 1900 and 2003 suggests that 93.6{\%} of damage is
94%the result of meteorological hazards, of which almost 25{\%} is
95%directly attributable to flooding~\cite{Blong-2005}.
96
97%Flooding of coastal communities may result from surges of near shore
98%waters caused by severe storms. The extent of inundation is
99%critically linked to tidal conditions, bathymetry and topography; as
100%recently exemplified in the United States by Hurricane Katrina.
101%While events of this scale are rare,  the extensive development of
102%Australia's coastal corridor means that storm surge inundation of
103%even a few hundred metres beyond the shoreline has the potential to
104%cause significant disruption and loss.
105%
106%Coastal communities also face the small but real risk of tsunami.
107%Fortunately, catastrophic tsunami of the scale of the 26 December
108%2004 event are exceedingly rare. However, smaller scale tsunami are
109%more common and regularly threaten coastal communities around the
110%world. Earthquakes which occur in the Java Trench near Indonesia
111%(e.g.~\cite{TsuMIS1995}) and along the Puysegur Ridge to the south
112%of New Zealand (e.g.~\cite{LebKC1998}) have potential to generate
113%tsunami that may threaten Australia's northwestern and southeastern
114%coastlines. This was particularly evident in the aftermath of the
115%Java tsunami on the 17th July 2006 where Western Australia did
116%experience a localised tsunami run up.
117
118Hydrodynamic modelling allows flooding, storm surge and tsunami
119hazards to be better understood, their impacts to be anticipated
120and, with appropriate planning, their effects to be mitigated.
121Geoscience Australia in collaboration with the Mathematical Sciences
122Institute, Australian National University, is developing a software
123application called \AnuGA{} to model the hydrodynamics of floods,
124storm surges and tsunamis. These hazards are modelled using the
125conservative shallow water equations which are described in
126section~\ref{sec:model}. In \AnuGA{} the solution of these equations
127are approximated using a finite volume method based on triangular
128meshes, as described in section~\ref{sec:fvm}. Nielsen \textit{et
129al.}~\cite{modsim2005} provides a more complete discussion of the
130method and also provides a validation of the model and method on a
131standard tsunami benchmark data set. Section~\ref{sec:parallel}
132provides a description of the parallelisation of the code using a
133domain decomposition strategy and in section~\ref{sec:analysis}
134preliminary timing results which verify the scalability of the
135parallel code are presented.
136
137
138
139\section{Model}
140\label{sec:model}
141
142The shallow water wave equations are a system of differential
143conservation equations which describe the flow of a thin layer of
144fluid over terrain. The form of the equations are
145\[
146\frac{\partial \UU}{\partial t}+\frac{\partial \EE}{\partial
147x}+\frac{\partial \GG}{\partial y}=\SSS \,,
148\]
149where $\UU=\left[ {{\begin{array}{*{20}c}
150 h & {uh} & {vh} \\
151\end{array} }} \right]^T$ is the vector of conserved quantities; water depth
152$h$, $x$ horizontal momentum $uh$ and $y$ horizontal momentum $vh$.
153Other quantities entering the system are bed elevation $z$ and stage
154$w$ (absolute water level), where these variables are related via $w
155= z + h$. The horizontal fluxes in the $x$ and $y$ directions, $\EE$
156and $\GG$ are
157\[
158\EE=\left[ {{\begin{array}{*{20}c}
159 {uh} \hfill \\
160 {u^2h+gh^2/2} \hfill \\
161 {uvh} \hfill \\
162\end{array} }} \right] \quad \mbox{ and }\quad \GG=\left[ {{\begin{array}{*{20}c}
163 {vh} \hfill \\
164 {vuh} \hfill \\
165 {v^2h+gh^2/2} \hfill \\
166\end{array} }} \right]
167\]
168and the source term (which includes gravity and friction) is
169\[
170\SSS=\left[ {{\begin{array}{*{20}c}
171 0 \hfill \\
172 -{gh(z_{x} + S_{fx} )} \hfill \\
173 -{gh(z_{y} + S_{fy} )} \hfill \\
174\end{array} }} \right] \,,
175\]
176where $S_f$ is the bed friction. We model the friction term using
177Manning's resistance law
178\[
179S_{fx} =\frac{u\eta ^2\sqrt {u^2+v^2} }{h^{4/3}}\quad
180\mbox{and}\quad S_{fy} =\frac{v\eta ^2\sqrt {u^2+v^2} }{h^{4/3}} \,,
181\]
182in which $\eta$ is the Manning resistance coefficient.
183
184As demonstrated in our papers, Zoppou and Roberts~\cite{Rob99l}, and
185Nielsen \textit{et al.}~\cite{modsim2005}, these equations provide
186an excellent model of flows associated with inundation such as dam
187breaks and tsunamis.
188
189\section{Finite volume method}
190\label{sec:fvm}
191
192We use a finite volume method for approximating the solution of the
193shallow water wave equations \cite{Rob99l}. The study area is
194represented by a mesh of triangular cells as in
195Figure~\ref{fig:mesh:reconstruct}, in which the conserved quantities
196$h$, $uh$, and $vh$, in each cell are to be determined. The size of
197the triangular cells are varied within the mesh to allow greater
198resolution in regions of particular interest.
199
200%\begin{figure}
201%\begin{center}
202%\includegraphics[width=5.0cm,keepaspectratio=true]{step-five}
203%\caption{Our finite volume method uses triangular meshes. Conserved
204%quantities $h$, $uh$ and $vh$ are associated with the centroid of
205%each triangular cell.} \label{fig:mesh}
206%\end{center}
207%\end{figure}
208
209\begin{figure}
210\begin{center}
211\includegraphics[width=5.0cm,keepaspectratio=true]{step-reconstruct}
212\caption{Conserved quantities $h$, $uh$ and $vh$ are associated with
213the centroid of each triangular cell. From the values of the
214conserved quantities at the centroid of the cell and its
215neighbouring cells, a discontinuous piecewise linear reconstruction
216of the conserved quantities is obtained.}
217\label{fig:mesh:reconstruct}
218\end{center}
219\end{figure}
220
221The equations constituting the finite volume method are obtained by
222integrating the differential conservation equations over each
223triangular cell of the mesh.
224
225Introducing some notation; we use $i$ to refer to the index of the
226$i$th triangular cell $T_i$, and ${\cal N}(i)$ to the set of indices
227referring to the cells neighbouring the $i$th cell. The area of the
228$i$th triangular cell is  $A_i$ and $l_{ij}$ is the length of the
229edge between the $i$th and $j$th cells.
230
231By applying the divergence theorem we obtain for each cell, an
232equation which describes the rate of change of the average of the
233conserved quantities within each cell, in terms of the fluxes across
234the edges of the cell and the effect of the source term. In
235particular, for each cell
236$$
237 \frac{d\UU_i }{dt}+ \frac1{A_i}\sum\limits_{j\in{\cal N}(i)} \HH_{ij} l_{ij} =
238 \SSS_i  \,,
239$$
240where
241%\begin{itemize}
242%\item $\UU_i$ is the vector of conserved quantities averaged over the $i$th cell,
243%\item $\SSS_i$ is the source term associated with the $i$th cell,
244%and
245%\item $\HH_{ij}$ is the outward normal flux of
246%material across the \textit{ij}-th edge.
247%\end{itemize}
248%
249$\UU_i$ is the vector of conserved quantities averaged over the
250$i$th cell, $\SSS_i$~is the source term associated with the~$i$th
251cell and $\HH_{ij}$~is the outward normal flux of material across
252the \textit{ij}-th edge.
253
254
255%\item $l_{ij}$ is the length of the edge between the $i$th and $j$th
256%cells
257%\item $m_{ij}$ is the midpoint of
258%the \textit{ij}th edge,
259%\item
260%$\mathbf{n}_{ij} = (n_{ij,1} , n_{ij,2})$is the outward pointing
261%normal along the \textit{ij}th edge, and The
262
263The flux $\HH_{ij}$ is evaluated using a numerical flux function
264$\HH(\cdot, \cdot ; \ \cdot)$ which is consistent with the shallow
265water flux in the sense that for all conserved quantity vectors
266$\UU$ and spatial vectors $\nn$
267$$
268H(\UU,\UU;\-\nn) = \EE(\UU) n_1 + \GG(\UU) n_2 \,.
269$$
270Then the flux across the \textit{ij}th edge is
271$$
272\HH_{ij}  = \HH(\bar{\UU}_i(m_{ij}), \bar{\UU}_j(m_{ij});\-
273\mathbf{n}_{ij}) \,,
274$$
275where $m_{ij}$ is the midpoint of the \textit{ij}th edge and
276$\mathbf{n}_{ij}$ is the outward pointing normal of the
277\textit{ij}th edge. The function $\bar{\UU}_i(x,y)$ for $(x,y) \in
278T_i$ is obtained from the average conserved quantity values,
279$\UU_k$, for $k \in \{ i \} \cup {\cal N}(i)$ (\textit{i}th cell and
280its neighbours). We use a second order reconstruction to produce a
281piece wise linear function reconstruction, $\bar{\UU}_i(x,y)$, of
282the conserved quantities for  all $(x,y) \in T_i$ for each cell (see
283Figure~\ref{fig:mesh:reconstruct}). This function is allowed to be
284discontinuous across the edges of the cells, but the slope of this
285function is limited to avoid artificially introduced oscillations.
286
287The numerical flux function $\HH(\cdot, \cdot ; \ \cdot)$ is
288obtained by rotating the coordinate system so the outward normal is
289in the $x$ direction. This then reduces the normal flux calculation
290to a one dimensional flux calculation. The central upwind scheme as
291described by Kurganov \textit{et al.}~\cite{KurNP2001} is then used
292to approximate the resulting fluxes of the one dimension problem.
293
294
295
296
297
298
299
300
301In the computations presented we use an explicit Euler time stepping
302method with variable time stepping adapted to the Courant Friedrichs
303Levy (CFL) condition number.
304
305
306
307
308
309
310
311%\section{Validation}
312%\label{sec:validation} The process of validating the \AnuGA{}
313%application is in its early stages, however initial indications are
314%encouraging.
315%
316%As part of the Third International Workshop on Long wave Runup
317%Models in 2004 (\url{http://www.cee.cornell.edu/longwave}), four
318%benchmark problems were specified to allow the comparison of
319%numerical, analytical and physical models with laboratory and field
320%data. One of these problems describes a wave tank simulation of the
321%1993 Okushiri Island tsunami off Hokkaido, Japan \cite{MatH2001}. A
322%significant feature of this tsunami was a maximum run up of 32~m
323%observed at the head of the Monai Valley. This run up was not
324%uniform along the coast and is thought to have resulted from a
325%particular topographic effect. Among other features, simulations of
326%the Hokkaido tsunami should capture this run up phenomenon.
327%
328%\begin{figure}[htbp]
329%\centerline{\includegraphics[width=4in]{tsunami-fig-3.eps}}
330%\caption{Comparison of wave tank and \AnuGA{} water stages at gauge
331%5.}\label{fig:val}
332%\end{figure}
333%
334%
335%\begin{figure}[htbp]
336%\centerline{\includegraphics[width=4in]{tsunami-fig-4.eps}}
337%\caption{Complex reflection patterns and run up into Monai Valley
338%simulated by \AnuGA{} and visualised using our netcdf OSG
339%viewer.}\label{fig:run}
340%\end{figure}
341%
342%
343%The wave tank simulation of the Hokkaido tsunami was used as the
344%first scenario for validating \AnuGA{}. The dataset provided
345%bathymetry and topography along with initial water depth and the
346%wave specifications. The dataset also contained water depth time
347%series from three wave gauges situated offshore from the simulated
348%inundation area.
349%
350%Figure~\ref{fig:val} compares the observed wave tank and modelled
351%\AnuGA{} water depth (stage height) at one of the gauges. The plots
352%show good agreement between the two time series, with \AnuGA{}
353%closely modelling the initial draw down, the wave shoulder and the
354%subsequent reflections. The discrepancy between modelled and
355%simulated data in the first 10 seconds is due to the initial
356%condition in the physical tank not being uniformly zero. Similarly
357%good comparisons are evident with data from the other two gauges.
358%Additionally, \AnuGA{} replicates exceptionally well the 32~m Monai
359%Valley run up, and demonstrates its occurrence to be due to the
360%interaction of the tsunami wave with two juxtaposed valleys above
361%the coastline. The run up is depicted in Figure~\ref{fig:run}.
362%
363%This successful replication of the tsunami wave tank simulation on a
364%complex 3D beach is a positive first step in validating the \AnuGA{}
365%modelling capability. Subsequent validation will be conducted as
366%additional datasets become available.
367
368
369
370\section{Parallel implementation}\label{sec:parallel}
371
372To parallelise our algorithm we use a domain decomposition strategy.
373We start with a global mesh which defines the domain of our problem.
374We must first subdivide the global mesh into a set of submeshes.
375This partitioning is done using the \Metis{} partitioning library,
376see Karypis~\cite{metis}. We demonstrate the efficiency of this
377library in the following subsections. Once this partitioning has
378been done, a layer of ``ghost'' cells is constructed and the
379corresponding communication patterns are set. Then we start to
380evolve our solution. The computation progresses independently on
381each submesh, until the end of the time step when the appropriate
382information is communicated among processing elements.
383
384%\begin{figure}[hbtp]
385%  \centerline{ \includegraphics[scale = 0.6]{domain.eps}}
386%  \caption{The first step of the parallel algorithm is to divide
387%  the mesh over the processors.}
388%  \label{fig:subpart}
389%\end{figure}
390
391
392
393\begin{figure}[h!]
394  \centerline{ \includegraphics[width=6.5cm]{mermesh.eps}}
395  \caption{The global Lake Merimbula mesh}
396 \label{fig:mergrid}
397\end{figure}
398
399\begin{figure}[h!]
400  \centerline{ \includegraphics[width=6.5cm]{mermesh4c.eps}
401  \includegraphics[width=6.5cm]{mermesh4a.eps}}
402
403  \centerline{ \includegraphics[width=6.5cm]{mermesh4d.eps}
404  \includegraphics[width=6.5cm]{mermesh4b.eps}}
405  \caption{The global Lake Merimula mesh from Figure~\ref{fig:mergrid}, partitioned into 4 submeshes using \Metis.}
406 \label{fig:mergrid4}
407\end{figure}
408
409
410
411
412\begin{table}[h!]
413\caption{4-way and 8-way partition tests of Merimbula mesh showing
414the percentage distribution of cells between the
415submeshes.}\label{tbl:mer}
416\begin{center}
417\begin{tabular}{|c|c c c c|}
418\hline
419CPU & 0 & 1 & 2 & 3\\
420\hline
421Cells & 2757 & 2713 & 2761 & 2554\\
422\% & 25.6\% & 25.2\% & 25.6\% & 23.7\%\ \\
423\hline
424\end{tabular}
425
426\medskip
427
428\begin{tabular}{|c|c c c c c c c c|}
429\hline
430CPU & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\
431\hline
432Cells & 1229 & 1293 & 1352 & 1341 & 1349 & 1401 & 1413 & 1407\\
433\% & 11.4\% & 12.0\% & 12.5\% & 12.4\% & 12.5\% & 13.0\% & 13.1\% & 13.1\%\\
434\hline
435\end{tabular}
436\end{center}
437\end{table}
438
439\subsection {Subdividing the global mesh}\label{sec:part1}
440
441The first step in parallelising the code is to subdivide the mesh
442into, roughly, equally sized partitions to ensure good load
443balancing. On a rectangular mesh this may be done by a simple
444coordinate based dissection method, but on a complicated domain such
445as the mesh shown in Figure~\ref{fig:mergrid}, a more sophisticated
446approach must be used.  We use the \Metis{} partitioning library
447based on Karypis and Kumar's~\cite{gk:metis} recommendations.
448Figure~\ref{fig:mergrid4} shows the original global grid partitioned
449over four submeshes. Note that one submesh may comprise several
450unconnected mesh partitions and Table~\ref{tbl:mer} gives the node
451distribution over four submeshes and eight submeshes. These results
452imply that \Metis{} gives a reasonably well balanced partition of
453the mesh.
454
455\begin{figure}[thp]
456  \centerline{ \includegraphics[scale = 0.55]{subdomainfinal.eps}}
457  \caption{An example subpartitioning of a global mesh into two submeshes, showing the
458  ghost nodes used in the two submeshes.
459}
460 \label{fig:subdomainf}
461\end{figure}
462
463
464\paragraph{Ghost cells}
465
466%\begin{figure}[h!]
467%  \centerline{ \includegraphics[height=8cm]{subdomain.eps}}
468%  \caption{An example subpartitioning of a global mesh into two submeshes.}
469% \label{fig:subdomain}
470%\end{figure}
471%
472%
473%\begin{figure}[b!]
474%  \centerline{ \includegraphics[height=8cm]{subdomainghost.eps}}
475%  \caption{An example subpartitioning with ghost cells. The numbers
476%in brackets shows the local numbering scheme that is calculated and
477%stored with the mesh, but not implemented until the local mesh is
478%built.}
479% \label{fig:subdomaing}
480%\end{figure}
481
482
483
484
485Consider the example subpartitioning given in
486Figure~\ref{fig:subdomainf}. The top mesh represents the global
487mesh, whereas the two lower meshes display the partitioning of the
488global mesh (together with extra ghost cells to facilitate
489communication) onto two processors.
490
491As an example, during the evolution calculations, cell~2 (residing
492on processor~0) will need to access information from its' global
493neighbour, cell~3 (residing on processor~1). A standard approach to
494this problem is to add an extra layer of cells, which we call ghost
495cells, to each submesh, on each processor. The ghost cells are used
496to hold information that an ordinary cell in a submesh needs to
497complete its calculations. The values associated with the ghost
498cells need to be updated through communication calls usually only
499once every time step (at least for first order time stepping). Such
500communication patterns are determined and recorded before sub
501partitioning the mesh into submeshes.
502
503%
504%
505%Referring to Figure~\ref{fig:subdomainf} we see that during each
506%communication call submesh~0  will have to send the updated values
507%for global cell~2 and cell~4 to submesh~1, and similarly submesh~1
508%will have to send the updated values for global cell~3 and cell~5 to
509%submesh~0. Each processor records it own communication pattern, i.e.
510%the expected pattern of data it expects to receive from the other
511%processors so as to update the ghost cell data, and also the pattern
512%of data sends used to send data to other processors to update their
513%ghost cells.
514
515The ghost cells in each submesh are treated exactly the same as any
516other cell, the only way to differentiate them is to look at the
517communication pattern. This means that the code is essentially the
518same whether it is being run in serial or parallel, the only
519difference is a communication call at the end of each time step and
520an extra \texttt{if} statement in the local calculation of the time
521step constraint to ensure that the ghost cells are not used in that
522calculation.
523
524\section{Performance analysis}\label{sec:analysis}
525
526
527To evaluate the performance of the code on a parallel machine we ran
528some examples on a cluster of four nodes connected with PathScale
529InfiniPath \textsc{htx}. Each node has two \textsc{amd} Opteron 275
530(Dual core 2.2 GHz Processors) and 4 GB of main memory. The system
531achieves 60 Gigaflops with the Linpack benchmark, which is about
53285\% of peak performance. For each test run we evaluate the parallel
533efficiency as
534\[
535E_p = \frac{T_1}{pT_p} 100 \,,
536\]
537where $T_p = \max_{0\le i < p}\{t_i\}$, $p$ is the total number of
538processors and $t_i$ is the time required to run the evolution code
539on processor $i$.  Note that $t_i$ only includes the time required
540to do the finite volume evolution calculations, not the time
541required to build and subpartition the mesh.
542
543\subsection{Advection, rectangular mesh}
544
545The first example solves an advection equation on a rectangular mesh
546of size $n$ by $m$. Table~\ref{tbl:rpa4080160} show the efficiency
547results for different values of $n$ and $m$. The examples where $p
548\le 4$ were run on one Opteron node containing four processors, the
549$p = 8$ example was run on two nodes (giving a total of eight
550processors). The communication within a node is faster than the
551communication across nodes, so we would expect to see a decrease in
552efficiency when we jump from four to eight processors. On the other
553hand, the efficiency is likely to increase with $n$ and $m$, due to
554an increased ratio between interior and exterior triangles and hence
555an increased ratio of computation to communication. The results
556displayed in Table~\ref{tbl:rpa4080160} verify this, expect perhaps
557for the slightly higher than expected efficiency of the $p=2$,
558$n=80$, $m=80$ case. Generally the efficiency results shown here are
559consistent and competitive.
560
561%\begin{table}[h!]
562%\caption{Parallel Efficiency Results for the Advection Problem on a
563%Rectangular Domain with {\tt N} = 40, {\tt M} =
564%40.\label{tbl:rpa40}}
565%\begin{center}
566%\begin{tabular}{|c|c c|}\hline
567%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
568%1 & 36.61 & \\
569%2 & 18.76 & 98 \\
570%4 & 10.16 & 90 \\
571%8 & 6.39 & 72 \\\hline
572%\end{tabular}
573%\end{center}
574%\end{table}
575%
576%\begin{table}[h!]
577%\caption{Parallel Efficiency Results for the Advection Problem on a
578%Rectangular Domain with {\tt N} = 80, {\tt M} =
579%80.\label{tbl:rpa80}}
580%\begin{center}
581%\begin{tabular}{|c|c c|}\hline
582%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
583%1 & 282.18 & \\
584%2 & 143.14 & 99 \\
585%4 & 75.06 & 94 \\
586%8 & 41.67 & 85 \\\hline
587%\end{tabular}
588%\end{center}
589%\end{table}
590%
591%
592%\begin{table}[h!]
593%\caption{Parallel Efficiency Results for the Advection Problem on a
594%Rectangular Domain with {\tt N} = 160, {\tt M} =
595%160.\label{tbl:rpa160}}
596%\begin{center}
597%\begin{tabular}{|c|c c|}\hline
598%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
599%1 & 2200.35 & \\
600%2 & 1126.35 & 97 \\
601%4 & 569.49 & 97 \\
602%8 & 304.43 & 90 \\\hline
603%\end{tabular}
604%\end{center}
605%\end{table}
606
607
608\begin{table}[h!]
609\caption{Parallel efficiency results for the advection problem on
610$n$ by $m$ rectangular meshes using $p$
611processors.\label{tbl:rpa4080160}}
612\begin{center}
613\begin{tabular}{|c|c c| c c | c c |}\hline
614 & \multicolumn{2}{c|}{40 by 40 mesh} & \multicolumn{2}{|c|}{80 by 80 mesh}
615 & \multicolumn{2}{|c|}{160 by 160 mesh} \\ \hline
616$p$ & $T_p$ (sec) & $E_p (\%)$ & $T_p$ (sec) & $E_p (\%)$ & $T_p$
617(sec) & $E_p (\%)$
618\\\hline
6191 & 36.6 &    &282.&     &  2200. &  \\
6202 & 18.8 & 98 & 143. & 99 &  1126. & 98 \\
6214 & 10.2 & 90 & 75.1 & 94 &  569. & 97 \\
6228 & 6.39 & 72 &41.7 & 85   &  304. & 90 \\\hline
623\end{tabular}
624\end{center}
625\end{table}
626
627%\begin{table}[h!]
628%\caption{Parallel Efficiency Results for the Advection Problem on a
629%Rectangular Domain with {\tt N} = 80, {\tt M} =
630%80.\label{tbl:rpa80}}
631%\begin{center}
632%\begin{tabular}{|c|c c|}\hline
633%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
634%1 & 282.18 & \\
635%2 & 143.14 & 99 \\
636%4 & 75.06 & 94 \\
637%8 & 41.67 & 85 \\\hline
638%\end{tabular}
639%\end{center}
640%\end{table}
641%
642%
643%\begin{table}[h!]
644%\caption{Parallel Efficiency Results for the Advection Problem on a
645%Rectangular Domain with {\tt N} = 160, {\tt M} =
646%160.\label{tbl:rpa160}}
647%\begin{center}
648%\begin{tabular}{|c|c c|}\hline
649%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
650%1 & 2200.35 & \\
651%2 & 1126.35 & 97 \\
652%4 & 569.49 & 97 \\
653%8 & 304.43 & 90 \\\hline
654%\end{tabular}
655%\end{center}
656%\end{table}
657
658
659\subsection{Advection, Lake Merimbula mesh}
660
661We now look at another advection example, except this time the mesh
662comes from a study of water flow in Lake Merimbula, New South Wales.
663The mesh is shown in Figure~\ref{fig:mergrid}. The results are given
664in Table \ref{tbl:rpm}. These are good efficiency results,
665especially considering the structure of this mesh.
666%Note that since we are solving an advection problem the amount of calculation
667%done on each triangle is relatively low, when we more to other problems that
668%involve more calculations we would expect the computation to communication
669%ratio to increase and thus get an increase in efficiency.
670
671%\begin{table}
672%\caption{Parallel Efficiency Results for the Advection Problem on
673%the Lake Merimbula Mesh.\label{tbl:rpm}}
674%\begin{center}
675%\begin{tabular}{|c|c c|}\hline
676%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
677%1 &145.17 & \\
678%2 &77.52 & 94 \\
679%4 & 41.24 & 88 \\
680%8 & 22.96 & 79 \\\hline
681%\end{tabular}
682%\end{center}
683%\end{table}
684%
685%\begin{table}
686%\caption{Parallel Efficiency Results for the Shallow Water Equation
687%on the
688%  Lake Merimbula Mesh.\label{tbl:rpsm}}
689%\begin{center}
690%\begin{tabular}{|c|c c|}\hline
691%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
692%1 & 7.04 & \\
693%2 & 3.62 & 97 \\
694%4 & 1.94 & 91 \\
695%8 & 1.15 & 77 \\\hline
696%\end{tabular}
697%\end{center}
698%\end{table}
699
700
701\begin{table}
702\caption{Parallel efficiency results for the advection equation and
703the shallow water equation on the Lake Merimbula mesh for $p$
704processors.\label{tbl:rpm}}
705\begin{center}
706\begin{tabular}{|c|c c| c c |}\hline
707 & \multicolumn{2}{c|}{Advection} & \multicolumn{2}{|c|}{Shallow water
708 eqn} \\ \hline
709$p$ & $T_p$ (sec) & $E_p (\%)$  & $T_p$ (sec) & $E_p (\%)$\\\hline
7101 &145.0 &      & 7.04 &  \\
7112 &77.5 & 94    & 3.62 & 97  \\
7124 & 41.2 & 88   & 1.94 & 91 \\
7138 & 23.0 & 79   & 1.15 & 76 \\\hline
714\end{tabular}
715\end{center}
716\end{table}
717
718%\begin{table}
719%\caption{Parallel Efficiency Results for the Shallow Water Equation
720%on the
721%  Lake Merimbula Mesh.\label{tbl:rpsm}}
722%\begin{center}
723%\begin{tabular}{|c|c c|}\hline
724%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
725%1 & 7.04 & \\
726%2 & 3.62 & 97 \\
727%4 & 1.94 & 91 \\
728%8 & 1.15 & 77 \\\hline
729%\end{tabular}
730%\end{center}
731%\end{table}
732
733
734\subsection{Shallow water, Lake Merimbula mesh}
735
736The final example we look at is the shallow water equation on the
737Lake Merimbula mesh.  The results for this case are also listed in
738Table \ref{tbl:rpm}. The efficiency results for two and four
739processors is again good. For eight processors the efficiency falls
740off rather quickly.
741
742On profiling the code we found that the loss of efficiency is due to
743the boundary update routine.  To allow maximum flexibility in
744experimenting with different boundary conditions, the boundary
745routines are written in \Python (as opposed to most of the other
746computationally intensive kernels which are written in \texttt{C}).
747When running the code on one processor the boundary routine accounts
748for about 72\% of the total computation time. The \Metis{}
749subpartition of the mesh produced an imbalance in the number of
750active boundary edges in each subpartition.  The profiler indicated
751that when running the problem on eight processors, Processor 0 spent
752about 3.8 times more time on the boundary calculation than Processor
7537, indicating about 3.8 times as many active boundary edges. This
754load imbalance reduced the parallel efficiency. In the future the
755boundary routine will be rewritten in \texttt{C} to reduce its
756overall contribution to the computation and so reduce the effect of
757this active boundary imbalance.
758
759
760
761
762
763
764\section{Conclusions}
765\label{sec:6} \AnuGA{} is a flexible and robust modelling system
766that simulates hydrodynamics by solving the shallow water wave
767equation in a triangular mesh. It models the process of wetting and
768drying as water enters and leaves an area and is capable of
769capturing hydraulic shocks due to the ability of the finite volume
770method to accommodate discontinuities in the solution. It has been
771used to simulate the behaviour of hydrodynamic natural hazards such
772as riverine flooding, storm surge and tsunami.
773
774The use of the parallel code will enhance the modelling capability
775of \AnuGA{} and will form part of Geoscience Australia's ongoing
776research effort to model and understand the potential impact from
777natural hazards in order to reduce their impact on Australian
778communities.
779
780
781%\paragraph{Acknowledgements:}
782%The authors are grateful to Belinda Barnes, National Centre for
783%Epidemiology and Population Health, Australian National University,
784%and Matt Hayne and Augusto Sanabria, Risk Research Group, Geoscience
785%Australia, for helpful reviews of a previous version of this paper.
786%Author Nielsen publish with the permission of the CEO, Geoscience
787%Australia.
788
789% Preferably provide your bibliography as a separate .bbl file.
790% Include URLs, DOIs, Math Review numbers or Zentralblatt numbers in your bibliography
791% so we help connect the web of science and ensure maximum visibility
792% for your article.
793
794
795\bibliographystyle{plain}
796\bibliography{database1}
797
798
799
800
801\end{document}
Note: See TracBrowser for help on using the repository browser.