Ignore:
Timestamp:
Jul 11, 2006, 8:39:56 PM (18 years ago)
Author:
steve
Message:
 
File:
1 edited

Legend:

Unmodified
Added
Removed
  • inundation/parallel/documentation/results.tex

    r3245 r3315  
    33
    44
    5 To evaluate the performance of the code on a parallel machine we ran some examples on a cluster of four nodes connected with PathScale InfiniPath HTX.
    6 Each node has two AMD Opteron 275 (Dual-core 2.2 GHz Processors) and 4 GB of main memory. The system achieves 60 Gigaflops with the Linpack benchmark, which is about 85\% of peak performance.
     5To evaluate the performance of the code on a parallel machine we ran
     6some examples on a cluster of four nodes connected with PathScale
     7InfiniPath HTX. Each node has two AMD Opteron 275 (Dual-core 2.2 GHz
     8Processors) and 4 GB of main memory. The system achieves 60
     9Gigaflops with the Linpack benchmark, which is about 85\% of peak
     10performance.
    711
    812For each test run we evaluate the parallel efficiency as
     
    1014E_n = \frac{T_1}{nT_n} 100,
    1115\]
    12 where $T_n = \max_{0\le i < n}\{t_i\}$, $n$ is the total number of processors (submesh) and $t_i$ is the time required to run the {\tt evolve} code on processor $i$.  Note that $t_i$ does not include the time required to build and subpartition the mesh etc., it only includes the time required to do the evolve calculations (eg. \code{domain.evolve(yieldstep = 0.1, finaltime = 3.0)}).
     16where $T_n = \max_{0\le i < n}\{t_i\}$, $n$ is the total number of
     17processors (submesh) and $t_i$ is the time required to run the {\tt
     18evolve} code on processor $i$.  Note that $t_i$ does not include the
     19time required to build and subpartition the mesh etc., it only
     20includes the time required to do the evolve calculations (eg.
     21\code{domain.evolve(yieldstep = 0.1, finaltime = 3.0)}).
    1322
    1423\section{Advection, Rectangular Domain}
    1524
    16 The first example looked at the rectangular domain example given in Section \ref{subsec:codeRPA}, except we changed the finaltime time to 1.0 (\code{domain.evolve(yieldstep = 0.1, finaltime = 1.0)}).
     25The first example looked at the rectangular domain example given in
     26Section \ref{subsec:codeRPA}, except we changed the finaltime time
     27to 1.0 (\code{domain.evolve(yieldstep = 0.1, finaltime = 1.0)}).
    1728
    18 For this particular example we can control the mesh size by changing the parameters \code{N} and \code{M} given in the following section of code taken from
     29For this particular example we can control the mesh size by changing
     30the parameters \code{N} and \code{M} given in the following section
     31of code taken from
    1932 Section \ref{subsec:codeRPA}.
    2033
     
    3346\end{verbatim}
    3447
    35 Tables \ref{tbl:rpa40}, \ref{tbl:rpa80} and \ref{tbl:rpa160} show the efficiency results for different values of \code{N} and \code{M}. The examples where $n \le 4$ were run on one Opteron node containing 4 processors, the $n = 8$ example was run on 2 nodes (giving a total of 8 processors). The communication within a node is faster than the communication across nodes, so we would expect to see a decrease in efficiency when we jump from 4 to 8 nodes. Furthermore, as \code{N} and \code{M} are increased the ratio of exterior to interior triangles decreases, which in-turn decreases the amount of communication relative the amount of computation  and thus the efficiency should increase.
     48Tables \ref{tbl:rpa40}, \ref{tbl:rpa80} and \ref{tbl:rpa160} show
     49the efficiency results for different values of \code{N} and
     50\code{M}. The examples where $n \le 4$ were run on one Opteron node
     51containing 4 processors, the $n = 8$ example was run on 2 nodes
     52(giving a total of 8 processors). The communication within a node is
     53faster than the communication across nodes, so we would expect to
     54see a decrease in efficiency when we jump from 4 to 8 nodes.
     55Furthermore, as \code{N} and \code{M} are increased the ratio of
     56exterior to interior triangles decreases, which in-turn decreases
     57the amount of communication relative the amount of computation  and
     58thus the efficiency should increase.
    3659
    3760The efficiency results shown here are competitive.
    3861
    39 \begin{table}
    40 \caption{Parallel Efficiency Results for the Advection Problem on a Rectangular Domain with {\tt N} = 40, {\tt M} = 40.\label{tbl:rpa40}}
     62\begin{table}
     63\caption{Parallel Efficiency Results for the Advection Problem on a
     64Rectangular Domain with {\tt N} = 40, {\tt M} =
     6540.\label{tbl:rpa40}}
    4166\begin{center}
    4267\begin{tabular}{|c|c c|}\hline
     
    5075\end{table}
    5176
    52 \begin{table}
    53 \caption{Parallel Efficiency Results for the Advection Problem on a Rectangular Domain with {\tt N} = 80, {\tt M} = 80.\label{tbl:rpa80}}
     77\begin{table}
     78\caption{Parallel Efficiency Results for the Advection Problem on a
     79Rectangular Domain with {\tt N} = 80, {\tt M} =
     8080.\label{tbl:rpa80}}
    5481\begin{center}
    5582\begin{tabular}{|c|c c|}\hline
     
    6491
    6592
    66 \begin{table}
    67 \caption{Parallel Efficiency Results for the Advection Problem on a Rectangular Domain with {\tt N} = 160, {\tt M} = 160.\label{tbl:rpa160}}
     93\begin{table}
     94\caption{Parallel Efficiency Results for the Advection Problem on a
     95Rectangular Domain with {\tt N} = 160, {\tt M} =
     96160.\label{tbl:rpa160}}
    6897\begin{center}
    6998\begin{tabular}{|c|c c|}\hline
     
    79108
    80109%Another way of measuring the performance of the code on a parallel machine is to increase the problem size as the number of processors are increased so that the number of triangles per processor remains roughly the same.  We have not carried out measurements of this kind as we usually have static grids and it is not possible to increase the number of triangles.
    81  
     110
    82111\section{Advection, Merimbula Mesh}
    83112
    84 We now look at another advection example, except this time the mesh comes from the Merimbula test problem. That is, we ran the code given in Section
    85 \ref{subsec:codeRPMM}, except the final time was reduced to 10000
    86 (\code{finaltime = 10000}). The results are given in Table \ref{tbl:rpm}.
    87 These are good efficiency results, especially considering the structure of the
    88 Merimbula mesh.
     113We now look at another advection example, except this time the mesh
     114comes from the Merimbula test problem. That is, we ran the code
     115given in Section \ref{subsec:codeRPMM}, except the final time was
     116reduced to 10000 (\code{finaltime = 10000}). The results are given
     117in Table \ref{tbl:rpm}. These are good efficiency results,
     118especially considering the structure of the Merimbula mesh.
    89119%Note that since we are solving an advection problem the amount of calculation
    90120%done on each triangle is relatively low, when we more to other problems that
    91121%involve more calculations we would expect the computation to communication ratio to increase and thus get an increase in efficiency.
    92122
    93 \begin{table} 
     123\begin{table}
    94124\caption{Parallel Efficiency Results for the Advection Problem on the
    95125  Merimbula Mesh.\label{tbl:rpm}}
     
    120150Processor 0 spent about 3.8 times more doing the \code{update_boundary}
    121151calculations than Processor 7. This load imbalance reduced the parallel
    122 efficiency. 
     152efficiency.
    123153
    124154Before doing the shallow equation calculations on a larger number of
     
    126156optimised as much as possible to reduce the effect of the load imbalance.
    127157
    128  
    129 \begin{table} 
     158
     159\begin{table}
    130160\caption{Parallel Efficiency Results for the Shallow Water Equation on the
    131161  Merimbula Mesh.\label{tbl:rpsm}}
Note: See TracChangeset for help on using the changeset viewer.