Changeset 3315 for inundation/parallel/documentation/results.tex
- Timestamp:
- Jul 11, 2006, 8:39:56 PM (18 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
inundation/parallel/documentation/results.tex
r3245 r3315 3 3 4 4 5 To evaluate the performance of the code on a parallel machine we ran some examples on a cluster of four nodes connected with PathScale InfiniPath HTX. 6 Each node has two AMD Opteron 275 (Dual-core 2.2 GHz Processors) and 4 GB of main memory. The system achieves 60 Gigaflops with the Linpack benchmark, which is about 85\% of peak performance. 5 To evaluate the performance of the code on a parallel machine we ran 6 some examples on a cluster of four nodes connected with PathScale 7 InfiniPath HTX. Each node has two AMD Opteron 275 (Dual-core 2.2 GHz 8 Processors) and 4 GB of main memory. The system achieves 60 9 Gigaflops with the Linpack benchmark, which is about 85\% of peak 10 performance. 7 11 8 12 For each test run we evaluate the parallel efficiency as … … 10 14 E_n = \frac{T_1}{nT_n} 100, 11 15 \] 12 where $T_n = \max_{0\le i < n}\{t_i\}$, $n$ is the total number of processors (submesh) and $t_i$ is the time required to run the {\tt evolve} code on processor $i$. Note that $t_i$ does not include the time required to build and subpartition the mesh etc., it only includes the time required to do the evolve calculations (eg. \code{domain.evolve(yieldstep = 0.1, finaltime = 3.0)}). 16 where $T_n = \max_{0\le i < n}\{t_i\}$, $n$ is the total number of 17 processors (submesh) and $t_i$ is the time required to run the {\tt 18 evolve} code on processor $i$. Note that $t_i$ does not include the 19 time required to build and subpartition the mesh etc., it only 20 includes the time required to do the evolve calculations (eg. 21 \code{domain.evolve(yieldstep = 0.1, finaltime = 3.0)}). 13 22 14 23 \section{Advection, Rectangular Domain} 15 24 16 The first example looked at the rectangular domain example given in Section \ref{subsec:codeRPA}, except we changed the finaltime time to 1.0 (\code{domain.evolve(yieldstep = 0.1, finaltime = 1.0)}). 25 The first example looked at the rectangular domain example given in 26 Section \ref{subsec:codeRPA}, except we changed the finaltime time 27 to 1.0 (\code{domain.evolve(yieldstep = 0.1, finaltime = 1.0)}). 17 28 18 For this particular example we can control the mesh size by changing the parameters \code{N} and \code{M} given in the following section of code taken from 29 For this particular example we can control the mesh size by changing 30 the parameters \code{N} and \code{M} given in the following section 31 of code taken from 19 32 Section \ref{subsec:codeRPA}. 20 33 … … 33 46 \end{verbatim} 34 47 35 Tables \ref{tbl:rpa40}, \ref{tbl:rpa80} and \ref{tbl:rpa160} show the efficiency results for different values of \code{N} and \code{M}. The examples where $n \le 4$ were run on one Opteron node containing 4 processors, the $n = 8$ example was run on 2 nodes (giving a total of 8 processors). The communication within a node is faster than the communication across nodes, so we would expect to see a decrease in efficiency when we jump from 4 to 8 nodes. Furthermore, as \code{N} and \code{M} are increased the ratio of exterior to interior triangles decreases, which in-turn decreases the amount of communication relative the amount of computation and thus the efficiency should increase. 48 Tables \ref{tbl:rpa40}, \ref{tbl:rpa80} and \ref{tbl:rpa160} show 49 the efficiency results for different values of \code{N} and 50 \code{M}. The examples where $n \le 4$ were run on one Opteron node 51 containing 4 processors, the $n = 8$ example was run on 2 nodes 52 (giving a total of 8 processors). The communication within a node is 53 faster than the communication across nodes, so we would expect to 54 see a decrease in efficiency when we jump from 4 to 8 nodes. 55 Furthermore, as \code{N} and \code{M} are increased the ratio of 56 exterior to interior triangles decreases, which in-turn decreases 57 the amount of communication relative the amount of computation and 58 thus the efficiency should increase. 36 59 37 60 The efficiency results shown here are competitive. 38 61 39 \begin{table} 40 \caption{Parallel Efficiency Results for the Advection Problem on a Rectangular Domain with {\tt N} = 40, {\tt M} = 40.\label{tbl:rpa40}} 62 \begin{table} 63 \caption{Parallel Efficiency Results for the Advection Problem on a 64 Rectangular Domain with {\tt N} = 40, {\tt M} = 65 40.\label{tbl:rpa40}} 41 66 \begin{center} 42 67 \begin{tabular}{|c|c c|}\hline … … 50 75 \end{table} 51 76 52 \begin{table} 53 \caption{Parallel Efficiency Results for the Advection Problem on a Rectangular Domain with {\tt N} = 80, {\tt M} = 80.\label{tbl:rpa80}} 77 \begin{table} 78 \caption{Parallel Efficiency Results for the Advection Problem on a 79 Rectangular Domain with {\tt N} = 80, {\tt M} = 80 80.\label{tbl:rpa80}} 54 81 \begin{center} 55 82 \begin{tabular}{|c|c c|}\hline … … 64 91 65 92 66 \begin{table} 67 \caption{Parallel Efficiency Results for the Advection Problem on a Rectangular Domain with {\tt N} = 160, {\tt M} = 160.\label{tbl:rpa160}} 93 \begin{table} 94 \caption{Parallel Efficiency Results for the Advection Problem on a 95 Rectangular Domain with {\tt N} = 160, {\tt M} = 96 160.\label{tbl:rpa160}} 68 97 \begin{center} 69 98 \begin{tabular}{|c|c c|}\hline … … 79 108 80 109 %Another way of measuring the performance of the code on a parallel machine is to increase the problem size as the number of processors are increased so that the number of triangles per processor remains roughly the same. We have not carried out measurements of this kind as we usually have static grids and it is not possible to increase the number of triangles. 81 110 82 111 \section{Advection, Merimbula Mesh} 83 112 84 We now look at another advection example, except this time the mesh comes from the Merimbula test problem. That is, we ran the code given in Section 85 \ref{subsec:codeRPMM}, except the final time was reduced to 10000 86 (\code{finaltime = 10000}). The results are given in Table \ref{tbl:rpm}. 87 These are good efficiency results, especially considering the structure of the 88 Merimbula mesh. 113 We now look at another advection example, except this time the mesh 114 comes from the Merimbula test problem. That is, we ran the code 115 given in Section \ref{subsec:codeRPMM}, except the final time was 116 reduced to 10000 (\code{finaltime = 10000}). The results are given 117 in Table \ref{tbl:rpm}. These are good efficiency results, 118 especially considering the structure of the Merimbula mesh. 89 119 %Note that since we are solving an advection problem the amount of calculation 90 120 %done on each triangle is relatively low, when we more to other problems that 91 121 %involve more calculations we would expect the computation to communication ratio to increase and thus get an increase in efficiency. 92 122 93 \begin{table} 123 \begin{table} 94 124 \caption{Parallel Efficiency Results for the Advection Problem on the 95 125 Merimbula Mesh.\label{tbl:rpm}} … … 120 150 Processor 0 spent about 3.8 times more doing the \code{update_boundary} 121 151 calculations than Processor 7. This load imbalance reduced the parallel 122 efficiency. 152 efficiency. 123 153 124 154 Before doing the shallow equation calculations on a larger number of … … 126 156 optimised as much as possible to reduce the effect of the load imbalance. 127 157 128 129 \begin{table} 158 159 \begin{table} 130 160 \caption{Parallel Efficiency Results for the Shallow Water Equation on the 131 161 Merimbula Mesh.\label{tbl:rpsm}}
Note: See TracChangeset
for help on using the changeset viewer.