Context Navigation

source: anuga_work/publications/ctac_2006/paper-ctac.tex @ 7244

Last change on this file since 7244 was 4950, checked in by steve, 17 years ago
Files to create ctac2006 paper available at ANZIAM journal http://anziamj.austms.org.au/ojs/index.php/ANZIAMJ/article/view/124/196
File size: 29.7 KB

Rev	Line
[4950]	1	% Use the standard \LaTeXe\ article style in 12pt Computer Modern font
	2	% on A4 paper by
	3	\documentclass[12pt,a4paper]{article}
	4	% Do \emph{not} change the width nor the height of the text from the
	5	% defaults set by this document class.
	6	%
	7	% The alternative which is closer to what we actually use is
	8	% \documentclass[11pt,a5paper]{article}
	9	% \usepackage[a5paper]{geometry}
	10	% Because it is a great size for on screen reading
	11	% and prints nicely on a4paper either 2up or booklet.
	12
	13	% The preamble is to contain your own \LaTeX\ commands and to say
	14	% what packages to use. Three useful packages are the following:
	15	\usepackage{graphicx} % avoid epsfig or earlier such packages
	16	\usepackage{url} % for URLs and DOIs
	17	\newcommand{\doi}[1]{\url{http://dx.doi.org/#1}}
	18	\usepackage{amsmath} % many want amsmath extensions
	19	\usepackage{amsfonts}
	20	\usepackage{underscore}
	21	% Avoid loading unused packages (as done by some \LaTeX\ editors).
	22
	23	% Create title and authors using \verb\|\maketitle\|. Separate authors by
	24	% \verb\|\and\| and put addresses in \verb\|\thanks\| command with
	25	% \verb\|\url\| command \verb\|\protect\|ed.
	26	\title{Parallelisation of a
	27	finite volume method for hydrodynamic inundation modelling}
	28
	29
	30	\author{
	31	S.~G.~Roberts\thanks{Dept. of Maths, Australian National University,
	32	Canberra, \textsc{Australia}. \protect\url{mailto:[stephen.roberts,
	33	linda.stals]@anu.edu.au}} \and L.~Stals\footnotemark[1] \and
	34	O.~M.~Nielsen\thanks{Risk Assessment Methods Project, Geospatial and
	35	Earth Monitoring Division, Geoscience Australia, Symonston,
	36	\textsc{Australia}. \protect\url{mailto:Ole.Nielsen@ga.gov.au}} }
	37
	38	\date{24 August 2006}
	39
	40	\newcommand{\AnuGA}{\textsc{anuga}}
	41	\newcommand{\Python}{\textsc{python}}
	42	\newcommand{\VPython}{\textsc{vpython}}
	43	\newcommand{\pypar}{\textsc{mpi}}
	44	\newcommand{\Metis}{\textsc{metis}}
	45	\newcommand{\mpi}{\textsc{mpi}}
	46
	47	\newcommand{\UU}{\mathbf{U}}
	48	\newcommand{\VV}{\mathbf{V}}
	49	\newcommand{\EE}{\mathbf{E}}
	50	\newcommand{\GG}{\mathbf{G}}
	51	\newcommand{\FF}{\mathbf{F}}
	52	\newcommand{\HH}{\mathbf{H}}
	53	\newcommand{\SSS}{\mathbf{S}}
	54	\newcommand{\nn}{\mathbf{n}}
	55
	56	\newcommand{\code}[1]{\texttt{#1}}
	57
	58
	59
	60	\begin{document}
	61
	62	% Use default \verb\|\maketitle\|.
	63	\maketitle
	64
	65	% Use the \verb\|abstract\| environment.
	66	\begin{abstract}
	67	Geoscience Australia and the Australian National University are
	68	developing a hydrodynamic inundation modelling tool called \AnuGA{}
	69	to help simulate the impact of natural inundation hazards such as
	70	riverine flooding, storm surges and tsunamis. The core of \AnuGA{}
	71	is a \Python{} implementation of a finite volume method, based on
	72	triangular meshes, for solving the conservative form of the Shallow
	73	Water Wave equation. We describe the parallelisation of the code
	74	using a domain decomposition strategy where we use the \Metis{}
	75	graph partitioning library to decompose the finite volume meshes.
	76	The parallel efficiency of our code is tested using a number of mesh
	77	partitions, and we verify that the \Metis{} graph partition is
	78	particularly efficient.
	79	\end{abstract}
	80
	81	% By default we include a table of contents in each paper.
	82	\tableofcontents
	83
	84	% Use \verb\|\section\|, \verb\|\subsection\|, \verb\|\subsubsection\| and
	85	% possibly \verb\|\paragraph\| to structure your document. Make sure
	86	% you \verb\|\label\| them for cross referencing with \verb\|\ref\|\,.
	87	\section{Introduction}
	88	\label{sec:intro}
	89
	90	%Floods are the single greatest cause of death due to natural hazards
	91	%in Australia, causing almost 40{\%} of the fatalities recorded
	92	%between 1788 and 2003~\cite{Blong-2005}. Analysis of buildings
	93	%damaged between 1900 and 2003 suggests that 93.6{\%} of damage is
	94	%the result of meteorological hazards, of which almost 25{\%} is
	95	%directly attributable to flooding~\cite{Blong-2005}.
	96
	97	%Flooding of coastal communities may result from surges of near shore
	98	%waters caused by severe storms. The extent of inundation is
	99	%critically linked to tidal conditions, bathymetry and topography; as
	100	%recently exemplified in the United States by Hurricane Katrina.
	101	%While events of this scale are rare, the extensive development of
	102	%Australia's coastal corridor means that storm surge inundation of
	103	%even a few hundred metres beyond the shoreline has the potential to
	104	%cause significant disruption and loss.
	105	%
	106	%Coastal communities also face the small but real risk of tsunami.
	107	%Fortunately, catastrophic tsunami of the scale of the 26 December
	108	%2004 event are exceedingly rare. However, smaller scale tsunami are
	109	%more common and regularly threaten coastal communities around the
	110	%world. Earthquakes which occur in the Java Trench near Indonesia
	111	%(e.g.~\cite{TsuMIS1995}) and along the Puysegur Ridge to the south
	112	%of New Zealand (e.g.~\cite{LebKC1998}) have potential to generate
	113	%tsunami that may threaten Australia's northwestern and southeastern
	114	%coastlines. This was particularly evident in the aftermath of the
	115	%Java tsunami on the 17th July 2006 where Western Australia did
	116	%experience a localised tsunami run up.
	117
	118	Hydrodynamic modelling allows flooding, storm surge and tsunami
	119	hazards to be better understood, their impacts to be anticipated
	120	and, with appropriate planning, their effects to be mitigated.
	121	Geoscience Australia in collaboration with the Mathematical Sciences
	122	Institute, Australian National University, is developing a software
	123	application called \AnuGA{} to model the hydrodynamics of floods,
	124	storm surges and tsunamis. These hazards are modelled using the
	125	conservative shallow water equations which are described in
	126	section~\ref{sec:model}. In \AnuGA{} the solution of these equations
	127	are approximated using a finite volume method based on triangular
	128	meshes, as described in section~\ref{sec:fvm}. Nielsen \textit{et
	129	al.}~\cite{modsim2005} provides a more complete discussion of the
	130	method and also provides a validation of the model and method on a
	131	standard tsunami benchmark data set. Section~\ref{sec:parallel}
	132	provides a description of the parallelisation of the code using a
	133	domain decomposition strategy and in section~\ref{sec:analysis}
	134	preliminary timing results which verify the scalability of the
	135	parallel code are presented.
	136
	137
	138
	139	\section{Model}
	140	\label{sec:model}
	141
	142	The shallow water wave equations are a system of differential
	143	conservation equations which describe the flow of a thin layer of
	144	fluid over terrain. The form of the equations are
	145	\[
	146	\frac{\partial \UU}{\partial t}+\frac{\partial \EE}{\partial
	147	x}+\frac{\partial \GG}{\partial y}=\SSS \,,
	148	\]
	149	where $\UU=\left[ {{\begin{array}{*{20}c}
	150	h & {uh} & {vh} \\
	151	\end{array} }} \right]^T$ is the vector of conserved quantities; water depth
	152	$h$, $x$ horizontal momentum $uh$ and $y$ horizontal momentum $vh$.
	153	Other quantities entering the system are bed elevation $z$ and stage
	154	$w$ (absolute water level), where these variables are related via $w
	155	= z + h$. The horizontal fluxes in the $x$ and $y$ directions, $\EE$
	156	and $\GG$ are
	157	\[
	158	\EE=\left[ {{\begin{array}{*{20}c}
	159	{uh} \hfill \\
	160	{u^2h+gh^2/2} \hfill \\
	161	{uvh} \hfill \\
	162	\end{array} }} \right] \quad \mbox{ and }\quad \GG=\left[ {{\begin{array}{*{20}c}
	163	{vh} \hfill \\
	164	{vuh} \hfill \\
	165	{v^2h+gh^2/2} \hfill \\
	166	\end{array} }} \right]
	167	\]
	168	and the source term (which includes gravity and friction) is
	169	\[
	170	\SSS=\left[ {{\begin{array}{*{20}c}
	171	0 \hfill \\
	172	-{gh(z_{x} + S_{fx} )} \hfill \\
	173	-{gh(z_{y} + S_{fy} )} \hfill \\
	174	\end{array} }} \right] \,,
	175	\]
	176	where $S_f$ is the bed friction. We model the friction term using
	177	Manning's resistance law
	178	\[
	179	S_{fx} =\frac{u\eta ^2\sqrt {u^2+v^2} }{h^{4/3}}\quad
	180	\mbox{and}\quad S_{fy} =\frac{v\eta ^2\sqrt {u^2+v^2} }{h^{4/3}} \,,
	181	\]
	182	in which $\eta$ is the Manning resistance coefficient.
	183
	184	As demonstrated in our papers, Zoppou and Roberts~\cite{Rob99l}, and
	185	Nielsen \textit{et al.}~\cite{modsim2005}, these equations provide
	186	an excellent model of flows associated with inundation such as dam
	187	breaks and tsunamis.
	188
	189	\section{Finite volume method}
	190	\label{sec:fvm}
	191
	192	We use a finite volume method for approximating the solution of the
	193	shallow water wave equations \cite{Rob99l}. The study area is
	194	represented by a mesh of triangular cells as in
	195	Figure~\ref{fig:mesh:reconstruct}, in which the conserved quantities
	196	$h$, $uh$, and $vh$, in each cell are to be determined. The size of
	197	the triangular cells are varied within the mesh to allow greater
	198	resolution in regions of particular interest.
	199
	200	%\begin{figure}
	201	%\begin{center}
	202	%\includegraphics[width=5.0cm,keepaspectratio=true]{step-five}
	203	%\caption{Our finite volume method uses triangular meshes. Conserved
	204	%quantities $h$, $uh$ and $vh$ are associated with the centroid of
	205	%each triangular cell.} \label{fig:mesh}
	206	%\end{center}
	207	%\end{figure}
	208
	209	\begin{figure}
	210	\begin{center}
	211	\includegraphics[width=5.0cm,keepaspectratio=true]{step-reconstruct}
	212	\caption{Conserved quantities $h$, $uh$ and $vh$ are associated with
	213	the centroid of each triangular cell. From the values of the
	214	conserved quantities at the centroid of the cell and its
	215	neighbouring cells, a discontinuous piecewise linear reconstruction
	216	of the conserved quantities is obtained.}
	217	\label{fig:mesh:reconstruct}
	218	\end{center}
	219	\end{figure}
	220
	221	The equations constituting the finite volume method are obtained by
	222	integrating the differential conservation equations over each
	223	triangular cell of the mesh.
	224
	225	Introducing some notation; we use $i$ to refer to the index of the
	226	$i$th triangular cell $T_i$, and ${\cal N}(i)$ to the set of indices
	227	referring to the cells neighbouring the $i$th cell. The area of the
	228	$i$th triangular cell is $A_i$ and $l_{ij}$ is the length of the
	229	edge between the $i$th and $j$th cells.
	230
	231	By applying the divergence theorem we obtain for each cell, an
	232	equation which describes the rate of change of the average of the
	233	conserved quantities within each cell, in terms of the fluxes across
	234	the edges of the cell and the effect of the source term. In
	235	particular, for each cell
	236	$$
	237	\frac{d\UU_i }{dt}+ \frac1{A_i}\sum\limits_{j\in{\cal N}(i)} \HH_{ij} l_{ij} =
	238	\SSS_i \,,
	239	$$
	240	where
	241	%\begin{itemize}
	242	%\item $\UU_i$ is the vector of conserved quantities averaged over the $i$th cell,
	243	%\item $\SSS_i$ is the source term associated with the $i$th cell,
	244	%and
	245	%\item $\HH_{ij}$ is the outward normal flux of
	246	%material across the \textit{ij}-th edge.
	247	%\end{itemize}
	248	%
	249	$\UU_i$ is the vector of conserved quantities averaged over the
	250	$i$th cell, $\SSS_i$~is the source term associated with the~$i$th
	251	cell and $\HH_{ij}$~is the outward normal flux of material across
	252	the \textit{ij}-th edge.
	253
	254
	255	%\item $l_{ij}$ is the length of the edge between the $i$th and $j$th
	256	%cells
	257	%\item $m_{ij}$ is the midpoint of
	258	%the \textit{ij}th edge,
	259	%\item
	260	%$\mathbf{n}_{ij} = (n_{ij,1} , n_{ij,2})$is the outward pointing
	261	%normal along the \textit{ij}th edge, and The
	262
	263	The flux $\HH_{ij}$ is evaluated using a numerical flux function
	264	$\HH(\cdot, \cdot ; \ \cdot)$ which is consistent with the shallow
	265	water flux in the sense that for all conserved quantity vectors
	266	$\UU$ and spatial vectors $\nn$
	267	$$
	268	H(\UU,\UU;\-\nn) = \EE(\UU) n_1 + \GG(\UU) n_2 \,.
	269	$$
	270	Then the flux across the \textit{ij}th edge is
	271	$$
	272	\HH_{ij} = \HH(\bar{\UU}_i(m_{ij}), \bar{\UU}_j(m_{ij});\-
	273	\mathbf{n}_{ij}) \,,
	274	$$
	275	where $m_{ij}$ is the midpoint of the \textit{ij}th edge and
	276	$\mathbf{n}_{ij}$ is the outward pointing normal of the
	277	\textit{ij}th edge. The function $\bar{\UU}_i(x,y)$ for $(x,y) \in
	278	T_i$ is obtained from the average conserved quantity values,
	279	$\UU_k$, for $k \in \{ i \} \cup {\cal N}(i)$ (\textit{i}th cell and
	280	its neighbours). We use a second order reconstruction to produce a
	281	piece wise linear function reconstruction, $\bar{\UU}_i(x,y)$, of
	282	the conserved quantities for all $(x,y) \in T_i$ for each cell (see
	283	Figure~\ref{fig:mesh:reconstruct}). This function is allowed to be
	284	discontinuous across the edges of the cells, but the slope of this
	285	function is limited to avoid artificially introduced oscillations.
	286
	287	The numerical flux function $\HH(\cdot, \cdot ; \ \cdot)$ is
	288	obtained by rotating the coordinate system so the outward normal is
	289	in the $x$ direction. This then reduces the normal flux calculation
	290	to a one dimensional flux calculation. The central upwind scheme as
	291	described by Kurganov \textit{et al.}~\cite{KurNP2001} is then used
	292	to approximate the resulting fluxes of the one dimension problem.
	293
	294
	295
	296
	297
	298
	299
	300
	301	In the computations presented we use an explicit Euler time stepping
	302	method with variable time stepping adapted to the Courant Friedrichs
	303	Levy (CFL) condition number.
	304
	305
	306
	307
	308
	309
	310
	311	%\section{Validation}
	312	%\label{sec:validation} The process of validating the \AnuGA{}
	313	%application is in its early stages, however initial indications are
	314	%encouraging.
	315	%
	316	%As part of the Third International Workshop on Long wave Runup
	317	%Models in 2004 (\url{http://www.cee.cornell.edu/longwave}), four
	318	%benchmark problems were specified to allow the comparison of
	319	%numerical, analytical and physical models with laboratory and field
	320	%data. One of these problems describes a wave tank simulation of the
	321	%1993 Okushiri Island tsunami off Hokkaido, Japan \cite{MatH2001}. A
	322	%significant feature of this tsunami was a maximum run up of 32~m
	323	%observed at the head of the Monai Valley. This run up was not
	324	%uniform along the coast and is thought to have resulted from a
	325	%particular topographic effect. Among other features, simulations of
	326	%the Hokkaido tsunami should capture this run up phenomenon.
	327	%
	328	%\begin{figure}[htbp]
	329	%\centerline{\includegraphics[width=4in]{tsunami-fig-3.eps}}
	330	%\caption{Comparison of wave tank and \AnuGA{} water stages at gauge
	331	%5.}\label{fig:val}
	332	%\end{figure}
	333	%
	334	%
	335	%\begin{figure}[htbp]
	336	%\centerline{\includegraphics[width=4in]{tsunami-fig-4.eps}}
	337	%\caption{Complex reflection patterns and run up into Monai Valley
	338	%simulated by \AnuGA{} and visualised using our netcdf OSG
	339	%viewer.}\label{fig:run}
	340	%\end{figure}
	341	%
	342	%
	343	%The wave tank simulation of the Hokkaido tsunami was used as the
	344	%first scenario for validating \AnuGA{}. The dataset provided
	345	%bathymetry and topography along with initial water depth and the
	346	%wave specifications. The dataset also contained water depth time
	347	%series from three wave gauges situated offshore from the simulated
	348	%inundation area.
	349	%
	350	%Figure~\ref{fig:val} compares the observed wave tank and modelled
	351	%\AnuGA{} water depth (stage height) at one of the gauges. The plots
	352	%show good agreement between the two time series, with \AnuGA{}
	353	%closely modelling the initial draw down, the wave shoulder and the
	354	%subsequent reflections. The discrepancy between modelled and
	355	%simulated data in the first 10 seconds is due to the initial
	356	%condition in the physical tank not being uniformly zero. Similarly
	357	%good comparisons are evident with data from the other two gauges.
	358	%Additionally, \AnuGA{} replicates exceptionally well the 32~m Monai
	359	%Valley run up, and demonstrates its occurrence to be due to the
	360	%interaction of the tsunami wave with two juxtaposed valleys above
	361	%the coastline. The run up is depicted in Figure~\ref{fig:run}.
	362	%
	363	%This successful replication of the tsunami wave tank simulation on a
	364	%complex 3D beach is a positive first step in validating the \AnuGA{}
	365	%modelling capability. Subsequent validation will be conducted as
	366	%additional datasets become available.
	367
	368
	369
	370	\section{Parallel implementation}\label{sec:parallel}
	371
	372	To parallelise our algorithm we use a domain decomposition strategy.
	373	We start with a global mesh which defines the domain of our problem.
	374	We must first subdivide the global mesh into a set of submeshes.
	375	This partitioning is done using the \Metis{} partitioning library,
	376	see Karypis~\cite{metis}. We demonstrate the efficiency of this
	377	library in the following subsections. Once this partitioning has
	378	been done, a layer of ``ghost'' cells is constructed and the
	379	corresponding communication patterns are set. Then we start to
	380	evolve our solution. The computation progresses independently on
	381	each submesh, until the end of the time step when the appropriate
	382	information is communicated among processing elements.
	383
	384	%\begin{figure}[hbtp]
	385	% \centerline{ \includegraphics[scale = 0.6]{domain.eps}}
	386	% \caption{The first step of the parallel algorithm is to divide
	387	% the mesh over the processors.}
	388	% \label{fig:subpart}
	389	%\end{figure}
	390
	391
	392
	393	\begin{figure}[h!]
	394	\centerline{ \includegraphics[width=6.5cm]{mermesh.eps}}
	395	\caption{The global Lake Merimbula mesh}
	396	\label{fig:mergrid}
	397	\end{figure}
	398
	399	\begin{figure}[h!]
	400	\centerline{ \includegraphics[width=6.5cm]{mermesh4c.eps}
	401	\includegraphics[width=6.5cm]{mermesh4a.eps}}
	402
	403	\centerline{ \includegraphics[width=6.5cm]{mermesh4d.eps}
	404	\includegraphics[width=6.5cm]{mermesh4b.eps}}
	405	\caption{The global Lake Merimula mesh from Figure~\ref{fig:mergrid}, partitioned into 4 submeshes using \Metis.}
	406	\label{fig:mergrid4}
	407	\end{figure}
	408
	409
	410
	411
	412	\begin{table}[h!]
	413	\caption{4-way and 8-way partition tests of Merimbula mesh showing
	414	the percentage distribution of cells between the
	415	submeshes.}\label{tbl:mer}
	416	\begin{center}
	417	\begin{tabular}{\|c\|c c c c\|}
	418	\hline
	419	CPU & 0 & 1 & 2 & 3\\
	420	\hline
	421	Cells & 2757 & 2713 & 2761 & 2554\\
	422	\% & 25.6\% & 25.2\% & 25.6\% & 23.7\%\ \\
	423	\hline
	424	\end{tabular}
	425
	426	\medskip
	427
	428	\begin{tabular}{\|c\|c c c c c c c c\|}
	429	\hline
	430	CPU & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\
	431	\hline
	432	Cells & 1229 & 1293 & 1352 & 1341 & 1349 & 1401 & 1413 & 1407\\
	433	\% & 11.4\% & 12.0\% & 12.5\% & 12.4\% & 12.5\% & 13.0\% & 13.1\% & 13.1\%\\
	434	\hline
	435	\end{tabular}
	436	\end{center}
	437	\end{table}
	438
	439	\subsection {Subdividing the global mesh}\label{sec:part1}
	440
	441	The first step in parallelising the code is to subdivide the mesh
	442	into, roughly, equally sized partitions to ensure good load
	443	balancing. On a rectangular mesh this may be done by a simple
	444	coordinate based dissection method, but on a complicated domain such
	445	as the mesh shown in Figure~\ref{fig:mergrid}, a more sophisticated
	446	approach must be used. We use the \Metis{} partitioning library
	447	based on Karypis and Kumar's~\cite{gk:metis} recommendations.
	448	Figure~\ref{fig:mergrid4} shows the original global grid partitioned
	449	over four submeshes. Note that one submesh may comprise several
	450	unconnected mesh partitions and Table~\ref{tbl:mer} gives the node
	451	distribution over four submeshes and eight submeshes. These results
	452	imply that \Metis{} gives a reasonably well balanced partition of
	453	the mesh.
	454
	455	\begin{figure}[thp]
	456	\centerline{ \includegraphics[scale = 0.55]{subdomainfinal.eps}}
	457	\caption{An example subpartitioning of a global mesh into two submeshes, showing the
	458	ghost nodes used in the two submeshes.
	459	}
	460	\label{fig:subdomainf}
	461	\end{figure}
	462
	463
	464	\paragraph{Ghost cells}
	465
	466	%\begin{figure}[h!]
	467	% \centerline{ \includegraphics[height=8cm]{subdomain.eps}}
	468	% \caption{An example subpartitioning of a global mesh into two submeshes.}
	469	% \label{fig:subdomain}
	470	%\end{figure}
	471	%
	472	%
	473	%\begin{figure}[b!]
	474	% \centerline{ \includegraphics[height=8cm]{subdomainghost.eps}}
	475	% \caption{An example subpartitioning with ghost cells. The numbers
	476	%in brackets shows the local numbering scheme that is calculated and
	477	%stored with the mesh, but not implemented until the local mesh is
	478	%built.}
	479	% \label{fig:subdomaing}
	480	%\end{figure}
	481
	482
	483
	484
	485	Consider the example subpartitioning given in
	486	Figure~\ref{fig:subdomainf}. The top mesh represents the global
	487	mesh, whereas the two lower meshes display the partitioning of the
	488	global mesh (together with extra ghost cells to facilitate
	489	communication) onto two processors.
	490
	491	As an example, during the evolution calculations, cell~2 (residing
	492	on processor~0) will need to access information from its' global
	493	neighbour, cell~3 (residing on processor~1). A standard approach to
	494	this problem is to add an extra layer of cells, which we call ghost
	495	cells, to each submesh, on each processor. The ghost cells are used
	496	to hold information that an ordinary cell in a submesh needs to
	497	complete its calculations. The values associated with the ghost
	498	cells need to be updated through communication calls usually only
	499	once every time step (at least for first order time stepping). Such
	500	communication patterns are determined and recorded before sub
	501	partitioning the mesh into submeshes.
	502
	503	%
	504	%
	505	%Referring to Figure~\ref{fig:subdomainf} we see that during each
	506	%communication call submesh~0 will have to send the updated values
	507	%for global cell~2 and cell~4 to submesh~1, and similarly submesh~1
	508	%will have to send the updated values for global cell~3 and cell~5 to
	509	%submesh~0. Each processor records it own communication pattern, i.e.
	510	%the expected pattern of data it expects to receive from the other
	511	%processors so as to update the ghost cell data, and also the pattern
	512	%of data sends used to send data to other processors to update their
	513	%ghost cells.
	514
	515	The ghost cells in each submesh are treated exactly the same as any
	516	other cell, the only way to differentiate them is to look at the
	517	communication pattern. This means that the code is essentially the
	518	same whether it is being run in serial or parallel, the only
	519	difference is a communication call at the end of each time step and
	520	an extra \texttt{if} statement in the local calculation of the time
	521	step constraint to ensure that the ghost cells are not used in that
	522	calculation.
	523
	524	\section{Performance analysis}\label{sec:analysis}
	525
	526
	527	To evaluate the performance of the code on a parallel machine we ran
	528	some examples on a cluster of four nodes connected with PathScale
	529	InfiniPath \textsc{htx}. Each node has two \textsc{amd} Opteron 275
	530	(Dual core 2.2 GHz Processors) and 4 GB of main memory. The system
	531	achieves 60 Gigaflops with the Linpack benchmark, which is about
	532	85\% of peak performance. For each test run we evaluate the parallel
	533	efficiency as
	534	\[
	535	E_p = \frac{T_1}{pT_p} 100 \,,
	536	\]
	537	where $T_p = \max_{0\le i < p}\{t_i\}$, $p$ is the total number of
	538	processors and $t_i$ is the time required to run the evolution code
	539	on processor $i$. Note that $t_i$ only includes the time required
	540	to do the finite volume evolution calculations, not the time
	541	required to build and subpartition the mesh.
	542
	543	\subsection{Advection, rectangular mesh}
	544
	545	The first example solves an advection equation on a rectangular mesh
	546	of size $n$ by $m$. Table~\ref{tbl:rpa4080160} show the efficiency
	547	results for different values of $n$ and $m$. The examples where $p
	548	\le 4$ were run on one Opteron node containing four processors, the
	549	$p = 8$ example was run on two nodes (giving a total of eight
	550	processors). The communication within a node is faster than the
	551	communication across nodes, so we would expect to see a decrease in
	552	efficiency when we jump from four to eight processors. On the other
	553	hand, the efficiency is likely to increase with $n$ and $m$, due to
	554	an increased ratio between interior and exterior triangles and hence
	555	an increased ratio of computation to communication. The results
	556	displayed in Table~\ref{tbl:rpa4080160} verify this, expect perhaps
	557	for the slightly higher than expected efficiency of the $p=2$,
	558	$n=80$, $m=80$ case. Generally the efficiency results shown here are
	559	consistent and competitive.
	560
	561	%\begin{table}[h!]
	562	%\caption{Parallel Efficiency Results for the Advection Problem on a
	563	%Rectangular Domain with {\tt N} = 40, {\tt M} =
	564	%40.\label{tbl:rpa40}}
	565	%\begin{center}
	566	%\begin{tabular}{\|c\|c c\|}\hline
	567	%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
	568	%1 & 36.61 & \\
	569	%2 & 18.76 & 98 \\
	570	%4 & 10.16 & 90 \\
	571	%8 & 6.39 & 72 \\\hline
	572	%\end{tabular}
	573	%\end{center}
	574	%\end{table}
	575	%
	576	%\begin{table}[h!]
	577	%\caption{Parallel Efficiency Results for the Advection Problem on a
	578	%Rectangular Domain with {\tt N} = 80, {\tt M} =
	579	%80.\label{tbl:rpa80}}
	580	%\begin{center}
	581	%\begin{tabular}{\|c\|c c\|}\hline
	582	%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
	583	%1 & 282.18 & \\
	584	%2 & 143.14 & 99 \\
	585	%4 & 75.06 & 94 \\
	586	%8 & 41.67 & 85 \\\hline
	587	%\end{tabular}
	588	%\end{center}
	589	%\end{table}
	590	%
	591	%
	592	%\begin{table}[h!]
	593	%\caption{Parallel Efficiency Results for the Advection Problem on a
	594	%Rectangular Domain with {\tt N} = 160, {\tt M} =
	595	%160.\label{tbl:rpa160}}
	596	%\begin{center}
	597	%\begin{tabular}{\|c\|c c\|}\hline
	598	%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
	599	%1 & 2200.35 & \\
	600	%2 & 1126.35 & 97 \\
	601	%4 & 569.49 & 97 \\
	602	%8 & 304.43 & 90 \\\hline
	603	%\end{tabular}
	604	%\end{center}
	605	%\end{table}
	606
	607
	608	\begin{table}[h!]
	609	\caption{Parallel efficiency results for the advection problem on
	610	$n$ by $m$ rectangular meshes using $p$
	611	processors.\label{tbl:rpa4080160}}
	612	\begin{center}
	613	\begin{tabular}{\|c\|c c\| c c \| c c \|}\hline
	614	& \multicolumn{2}{c\|}{40 by 40 mesh} & \multicolumn{2}{\|c\|}{80 by 80 mesh}
	615	& \multicolumn{2}{\|c\|}{160 by 160 mesh} \\ \hline
	616	$p$ & $T_p$ (sec) & $E_p (\%)$ & $T_p$ (sec) & $E_p (\%)$ & $T_p$
	617	(sec) & $E_p (\%)$
	618	\\\hline
	619	1 & 36.6 & &282.& & 2200. & \\
	620	2 & 18.8 & 98 & 143. & 99 & 1126. & 98 \\
	621	4 & 10.2 & 90 & 75.1 & 94 & 569. & 97 \\
	622	8 & 6.39 & 72 &41.7 & 85 & 304. & 90 \\\hline
	623	\end{tabular}
	624	\end{center}
	625	\end{table}
	626
	627	%\begin{table}[h!]
	628	%\caption{Parallel Efficiency Results for the Advection Problem on a
	629	%Rectangular Domain with {\tt N} = 80, {\tt M} =
	630	%80.\label{tbl:rpa80}}
	631	%\begin{center}
	632	%\begin{tabular}{\|c\|c c\|}\hline
	633	%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
	634	%1 & 282.18 & \\
	635	%2 & 143.14 & 99 \\
	636	%4 & 75.06 & 94 \\
	637	%8 & 41.67 & 85 \\\hline
	638	%\end{tabular}
	639	%\end{center}
	640	%\end{table}
	641	%
	642	%
	643	%\begin{table}[h!]
	644	%\caption{Parallel Efficiency Results for the Advection Problem on a
	645	%Rectangular Domain with {\tt N} = 160, {\tt M} =
	646	%160.\label{tbl:rpa160}}
	647	%\begin{center}
	648	%\begin{tabular}{\|c\|c c\|}\hline
	649	%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
	650	%1 & 2200.35 & \\
	651	%2 & 1126.35 & 97 \\
	652	%4 & 569.49 & 97 \\
	653	%8 & 304.43 & 90 \\\hline
	654	%\end{tabular}
	655	%\end{center}
	656	%\end{table}
	657
	658
	659	\subsection{Advection, Lake Merimbula mesh}
	660
	661	We now look at another advection example, except this time the mesh
	662	comes from a study of water flow in Lake Merimbula, New South Wales.
	663	The mesh is shown in Figure~\ref{fig:mergrid}. The results are given
	664	in Table \ref{tbl:rpm}. These are good efficiency results,
	665	especially considering the structure of this mesh.
	666	%Note that since we are solving an advection problem the amount of calculation
	667	%done on each triangle is relatively low, when we more to other problems that
	668	%involve more calculations we would expect the computation to communication
	669	%ratio to increase and thus get an increase in efficiency.
	670
	671	%\begin{table}
	672	%\caption{Parallel Efficiency Results for the Advection Problem on
	673	%the Lake Merimbula Mesh.\label{tbl:rpm}}
	674	%\begin{center}
	675	%\begin{tabular}{\|c\|c c\|}\hline
	676	%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
	677	%1 &145.17 & \\
	678	%2 &77.52 & 94 \\
	679	%4 & 41.24 & 88 \\
	680	%8 & 22.96 & 79 \\\hline
	681	%\end{tabular}
	682	%\end{center}
	683	%\end{table}
	684	%
	685	%\begin{table}
	686	%\caption{Parallel Efficiency Results for the Shallow Water Equation
	687	%on the
	688	% Lake Merimbula Mesh.\label{tbl:rpsm}}
	689	%\begin{center}
	690	%\begin{tabular}{\|c\|c c\|}\hline
	691	%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
	692	%1 & 7.04 & \\
	693	%2 & 3.62 & 97 \\
	694	%4 & 1.94 & 91 \\
	695	%8 & 1.15 & 77 \\\hline
	696	%\end{tabular}
	697	%\end{center}
	698	%\end{table}
	699
	700
	701	\begin{table}
	702	\caption{Parallel efficiency results for the advection equation and
	703	the shallow water equation on the Lake Merimbula mesh for $p$
	704	processors.\label{tbl:rpm}}
	705	\begin{center}
	706	\begin{tabular}{\|c\|c c\| c c \|}\hline
	707	& \multicolumn{2}{c\|}{Advection} & \multicolumn{2}{\|c\|}{Shallow water
	708	eqn} \\ \hline
	709	$p$ & $T_p$ (sec) & $E_p (\%)$ & $T_p$ (sec) & $E_p (\%)$\\\hline
	710	1 &145.0 & & 7.04 & \\
	711	2 &77.5 & 94 & 3.62 & 97 \\
	712	4 & 41.2 & 88 & 1.94 & 91 \\
	713	8 & 23.0 & 79 & 1.15 & 76 \\\hline
	714	\end{tabular}
	715	\end{center}
	716	\end{table}
	717
	718	%\begin{table}
	719	%\caption{Parallel Efficiency Results for the Shallow Water Equation
	720	%on the
	721	% Lake Merimbula Mesh.\label{tbl:rpsm}}
	722	%\begin{center}
	723	%\begin{tabular}{\|c\|c c\|}\hline
	724	%$n$ & $T_n$ (sec) & $E_n (\%)$ \\\hline
	725	%1 & 7.04 & \\
	726	%2 & 3.62 & 97 \\
	727	%4 & 1.94 & 91 \\
	728	%8 & 1.15 & 77 \\\hline
	729	%\end{tabular}
	730	%\end{center}
	731	%\end{table}
	732
	733
	734	\subsection{Shallow water, Lake Merimbula mesh}
	735
	736	The final example we look at is the shallow water equation on the
	737	Lake Merimbula mesh. The results for this case are also listed in
	738	Table \ref{tbl:rpm}. The efficiency results for two and four
	739	processors is again good. For eight processors the efficiency falls
	740	off rather quickly.
	741
	742	On profiling the code we found that the loss of efficiency is due to
	743	the boundary update routine. To allow maximum flexibility in
	744	experimenting with different boundary conditions, the boundary
	745	routines are written in \Python (as opposed to most of the other
	746	computationally intensive kernels which are written in \texttt{C}).
	747	When running the code on one processor the boundary routine accounts
	748	for about 72\% of the total computation time. The \Metis{}
	749	subpartition of the mesh produced an imbalance in the number of
	750	active boundary edges in each subpartition. The profiler indicated
	751	that when running the problem on eight processors, Processor 0 spent
	752	about 3.8 times more time on the boundary calculation than Processor
	753	7, indicating about 3.8 times as many active boundary edges. This
	754	load imbalance reduced the parallel efficiency. In the future the
	755	boundary routine will be rewritten in \texttt{C} to reduce its
	756	overall contribution to the computation and so reduce the effect of
	757	this active boundary imbalance.
	758
	759
	760
	761
	762
	763
	764	\section{Conclusions}
	765	\label{sec:6} \AnuGA{} is a flexible and robust modelling system
	766	that simulates hydrodynamics by solving the shallow water wave
	767	equation in a triangular mesh. It models the process of wetting and
	768	drying as water enters and leaves an area and is capable of
	769	capturing hydraulic shocks due to the ability of the finite volume
	770	method to accommodate discontinuities in the solution. It has been
	771	used to simulate the behaviour of hydrodynamic natural hazards such
	772	as riverine flooding, storm surge and tsunami.
	773
	774	The use of the parallel code will enhance the modelling capability
	775	of \AnuGA{} and will form part of Geoscience Australia's ongoing
	776	research effort to model and understand the potential impact from
	777	natural hazards in order to reduce their impact on Australian
	778	communities.
	779
	780
	781	%\paragraph{Acknowledgements:}
	782	%The authors are grateful to Belinda Barnes, National Centre for
	783	%Epidemiology and Population Health, Australian National University,
	784	%and Matt Hayne and Augusto Sanabria, Risk Research Group, Geoscience
	785	%Australia, for helpful reviews of a previous version of this paper.
	786	%Author Nielsen publish with the permission of the CEO, Geoscience
	787	%Australia.
	788
	789	% Preferably provide your bibliography as a separate .bbl file.
	790	% Include URLs, DOIs, Math Review numbers or Zentralblatt numbers in your bibliography
	791	% so we help connect the web of science and ensure maximum visibility
	792	% for your article.
	793
	794
	795	\bibliographystyle{plain}
	796	\bibliography{database1}
	797
	798
	799
	800
	801	\end{document}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: