Opened 17 years ago

Closed 14 years ago

#178 closed enhancement (fixed)

Time to load and fit mesh file (domain.set_quantity) is the slowest part for large parallel model runs

Reported by: nick Owned by: steve
Priority: normal Milestone:
Component: Efficiency and optimisation Version:
Severity: normal Keywords:
Cc: duncan, sexton, rwilson

Description (last modified by ole)

Here are the some stats and times for different parts of ANUGA for 3 similar models scenarios except for the number of triangles

J:\inundation\data\western_australia\broome_tsunami_scenario_2006\anuga\outputs\20070518_060438_run_final_0_dampier_nbartzis
380000 triangles, on Tornado
bathy file load 58000 sec (15hrs)
load and fit boundary condition 35000 (10hrs)
evolution 25000 (7hrs)

J:\inundation\data\western_australia\broome_tsunami_scenario_2006\anuga\outputs\20070621_235838_run_final_0_onslow_nbartzis
430000 triangles, on cyclone
bathy file load 160000 sec (44hrs)
load and fit boundary condition 56000 (15hrs)
evolution 80000 (22hrs) PARALLEL using 4 cpus


J:\inundation\data\western_australia\broome_tsunami_scenario_2006\anuga\outputs\20070615_062002_run_final_0_onslow_nbartzis
590000 triangles, on cyclone
bathy file load 250000 sec (70hrs)
load and fit boundary condition 25000 (7hrs)
evolution 270000 (75hrs) NOT parallel, so can expect around 25 hours with parallel 

I think the next effort to increase speed in anuga should be focused on loading and fiting the bathy file to the mesh (domain.set_quantity). Even if this process cached more regularly so when it is half way through and there is a failure it doesn't need to run the whole thing again.

Also the length of time a model needs to run is not the biggest issue really, but greatly increases the risk of the following things we can't control. Such as network issue, file access problems, node failures, custer node reboots, gniess and perlite system reboots and rifts in the space-time continuum.

Change History (18)

comment:1 Changed 17 years ago by ole

  • Owner changed from ole to duncan

comment:2 Changed 17 years ago by nick

the bathy file used in all is J:\inundation\data\western_australia\broome_tsunami_scenario_2006\anuga\topographies\broome_combined_elevation_unclipped1.txt and it has about 15,000,000 points

comment:3 Changed 17 years ago by ole

Added profiling code towards this ticket in changeset:4578

comment:4 Changed 17 years ago by ole

  • Description modified (diff)

comment:5 Changed 17 years ago by ole

  • Cc duncan nick added

Result from profiling (Ole's home computer) Clearly _search_triangles_of_vertices is the dominant contributor to the total running time with 2292s out of the 2645.695 CPU seconds used by set_quantity.

C:\work\inundation\anuga_validation\performance_tests\okushiri>python run_okushiri_profile_fitting.py
Caching: looking for cached files c:\.python_cache\_pmesh_to_domain[-1292443611]_{Result,Args,Admin}.z
Caching: Dependencies are ['Benchmark_2.msh']
+----------------------------------------------------------
| Tue Jul 03 05:30:41 2007. Caching statistics (retrieving)
+----------------------------------------------------------
| Function:     _pmesh_to_domain
| Arguments:    ('Benchmark_2.msh', None)
| CPU time:     1.64 seconds
| Loading time: 0.03 seconds
| Time saved:   1.61 seconds
|
| Caching dir:  c:\.python_cache\
| Result file:  _pmesh_to_domain[-1292443611]_Result.z (603089 bytes, compressed)
| Args file:    _pmesh_to_domain[-1292443611]_Args.z (40 bytes, compressed)
| Admin file:   _pmesh_to_domain[-1292443611]_Admin.z (432 bytes, compressed)
|
| Dependencies:
|  Benchmark_2.msh: Wed May 30 04:01:51 2007   1341952 bytes
+----------------------------------------------------------

General_mesh: Building basic mesh structure
General_mesh: Computing areas, normals and edgelenghts
(0/41457)
(4146/41457)
(8292/41457)
(12438/41457)
(16584/41457)
(20730/41457)
(24876/41457)
(29022/41457)
(33168/41457)
(37314/41457)
Building inverted triangle structure
Initialising mesh
Mesh: Computing centroids and radii
(0/41457)
(4146/41457)
(8292/41457)
(12438/41457)
(16584/41457)
(20730/41457)
(24876/41457)
(29022/41457)
(33168/41457)
(37314/41457)
Mesh: Building neigbour structure
Mesh: Building surrogate neigbour structure
Mesh: Building boundary dictionary
Mesh: Building tagged elements dictionary
Mesh: Done
Initialising Domain
Domain: Set up communication buffers (parallel)
Domain: Initialising quantity values
Domain: Done
------------------------------------------------
Mesh statistics:
  Number of triangles = 41457
  Extent [m]:
    x in [0.000000, 5.448000]
    y in [0.000000, 3.402000]
  Areas [m^2]:
    A in [0.000009, 0.099963]
    number of distinct areas: 41457
    Histogram:
      [0.000009, 0.010004[: 41225
      [0.010004, 0.020000[: 46
      [0.020000, 0.029995[: 25
      [0.029995, 0.039991[: 16
      [0.039991, 0.049986[: 34
      [0.049986, 0.059981[: 31
      [0.059981, 0.069977[: 34
      [0.069977, 0.079972[: 22
      [0.079972, 0.089968[: 18
      [0.089968, 0.099963]: 6
    Percentiles (10%):
      4145 triangles in [0.000009, 0.000016]
      4145 triangles in [0.000016, 0.000018]
      4145 triangles in [0.000018, 0.000021]
      4145 triangles in [0.000021, 0.000026]
      4145 triangles in [0.000026, 0.000147]
      4145 triangles in [0.000147, 0.000188]
      4145 triangles in [0.000188, 0.000233]
      4145 triangles in [0.000233, 0.000278]
      4145 triangles in [0.000278, 0.000362]
      4145 triangles in [0.000362, 0.089004]
      7 triangles in [0.089004, 0.099963]
  Boundary:
    Number of boundary segments == 97
    Boundary tags == ['wall', 'wave']
------------------------------------------------

FitInterpolate: Building mesh
FitInterpolate: Building quad tree
Building smoothing matrix
Geospatial data created from file: Benchmark_2_Bathymetry.pts
Data will be loaded blockwise on demand
Got 1 variables: ['elevation']
Reading 95892 points (in ~191 blocks) from file Benchmark_2_Bathymetry.pts.  Each block consists of 500 data points
Reading block 0 (points 0 to 500) out of 191
Processing Block 0
Reading block 19 (points 9500 to 10000) out of 191
Reading block 38 (points 19000 to 19500) out of 191
Reading block 57 (points 28500 to 29000) out of 191
Reading block 76 (points 38000 to 38500) out of 191
Reading block 95 (points 47500 to 48000) out of 191
Reading block 115 (points 57500 to 58000) out of 191
Reading block 134 (points 67000 to 67500) out of 191
Reading block 153 (points 76500 to 77000) out of 191
Reading block 172 (points 86000 to 86500) out of 191
That took 5849.25 seconds
Tue Jul 03 07:08:32 2007    profile.dat

         137880506 function calls (137516717 primitive calls) in 2645.695 CPU seconds

   Ordered by: cumulative time
   List reduced from 112 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001 2645.695 2645.695 profile:0(domain.set_quantity('elevation',
                           filename=project.bathymetry_filename,
                           alpha=0.02,
                           verbose=True))
        1    0.000    0.000 2645.694 2645.694 <string>:1(?)
        1    0.000    0.000 2645.694 2645.694 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\doma
in.py:303(set_quantity)
        1    0.000    0.000 2645.694 2645.694 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\quan
tity.py:171(set_values)
        1    0.000    0.000 2645.548 2645.548 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\quan
tity.py:727(set_values_from_file)
        1    0.229    0.229 2642.903 2642.903 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:435(fit_
to_mesh)
        1    0.000    0.000 2642.675 2642.675 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:481(_fit
_to_mesh)
        1    0.017    0.017 2447.031 2447.031 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:304(fit)

      192    0.009    0.000 2426.180   12.636 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:395(buil
d_fit_subset)
      192   19.022    0.099 2426.164   12.636 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:216(_bui
ld_matrix_AtA_Atz)
    95160    3.655    0.000 2370.224    0.025 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\search_function
s.py:12(search_tree_of_vertices)
    95574 1075.162    0.011 2292.396    0.024 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\search_function
s.py:52(_search_triangles_of_vertices)
 22888302  352.049    0.000  474.539    0.000 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\gene
ral_mesh.py:285(get_vertex_coordinate)
 45454319  386.493    0.000  386.493    0.000 C:\Python23\lib\site-packages\Numeric\Numeric.py:330(dot)
        1    0.000    0.000  195.643  195.643 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:50(__ini
t__)
 22726683  184.353    0.000  184.353    0.000 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\gene
ral_mesh.py:234(get_normal)
        1    0.001    0.001  178.051  178.051 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\general_fit_int
erpolate.py:42(__init__)
  1328995   53.903    0.000  143.153    0.000 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\gene
ral_mesh.py:381(get_triangles_and_vertices_per_node)
        1   14.628   14.628  122.676  122.676 C:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\neig
hbour_mesh.py:608(check_integrity)
 22888495  118.443    0.000  122.515    0.000 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\gene
ral_mesh.py:264(get_vertex_coordinates)
  2075222  102.748    0.000  102.748    0.000 C:\Python23\lib\site-packages\Numeric\Numeric.py:587(sum)
   124371    4.698    0.000   76.386    0.001 c:\work\inundation\anuga_core\source\anuga\utilities\numerical_tools.py:11
7(anglediff)
   248742   32.413    0.000   71.688    0.000 c:\work\inundation\anuga_core\source\anuga\utilities\numerical_tools.py:78
(angle)
    95160   12.566    0.000   69.703    0.001 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:74(search)
338747/95160   28.135    0.000   49.270    0.001 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:95(search_
branch)
  7824303   39.867    0.000   39.867    0.000 c:\work\inundation\anuga_core\source\anuga\utilities\numerical_tools.py:30
9(get_machine_precision)
        1    0.019    0.019   31.592   31.592 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:382(build_qua
dtree)
   124274    2.870    0.000   31.319    0.000 C:\Python23\lib\site-packages\Numeric\Numeric.py:350(array_str)
   2369/1    7.320    0.003   30.160   30.160 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:245(split)
  1875470   24.024    0.000   28.852    0.000 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:127(contains)

<pstats.Stats instance at 0x025AA058>

comment:6 Changed 17 years ago by ole

  • Owner changed from duncan to ole

I'll have a stab at this since I have started the profiling and gotten my head around the problem.

comment:7 Changed 17 years ago by ole

  • Status changed from new to assigned

comment:8 Changed 17 years ago by ole

  • Owner changed from ole to duncan
  • Status changed from assigned to new

Duncan and Ole discussed new algorithm based on

  • Centroids stored in quad tree instead of vertices: fewer points, no triang repeat
  • Search triangle neigbours rather than back tracking
  • Deeper quad trees

This should speeed things up

comment:9 Changed 17 years ago by ole

  • Cc sexton added

Here's a suggestion for an improved algorithm:

Store triangle indices in quad cells instead of storing points (vertices or centroids). Each triangle may belong to more than one quad cell reflecting the fact that triangles will be split across the rectilinear cell boundaries.

The search for which triangle that contains data point x will then become much simpler than before: For each data point x, locate the cell where x belongs and go through the list of triangles in that cell. No search for neighbours and no backtracking will be necessary. The size of the tree will also be smaller than currently.

The build becomes slightly, but not much, more complex. Here's a suggestion for an algorithm: (1) For each triangle vertex, find the appropriate quad cell. This the same as what we have now. (2) Then add to the cell ids of triangles that are using the found vertex. (3) After this has been done, go through all triangles in each cell and do the following: If all three vertices (and the centroid) belongs to the one cell all is well and good (the triangle is fully contained in one cell). If not, the triangle is split among cells, so we add the neighbours of the split triangle to the cell.

I think this will build the desired tree - in any case this is something that can be tested thoroughly since the build algorithm is independent on datapoints.

If we find a pathological example of triangles that won't be registered correctly, we can concentrate on fixing it in the build step rather than the search.

Please let me know what you think! Cheers Ole

comment:10 Changed 17 years ago by duncan

  • Resolution set to fixed
  • Status changed from new to closed

The fitting code has been speed up by a factor of 4.

There were 3 main speed ups;

  • check if a point is in the extent of a triagle before calculating sigmas.
  • optimising the max points per cell value.
  • Calculating the triangle vert and norm info only once per cell and storing the info for latter use.

Here are some suggestions for future speed up;

  • Use the point in polygon c code to check which triangle the point is in, the find a new optimum cell size.
  • Use the mesh structure already created for domain.
  • Add a flag along the lines of point_data_sorted and if this is true check if the point data is in the same triangle/cell as the last point.
  • Do the sigma calculations in c.

comment:11 Changed 16 years ago by ole

  • Resolution fixed deleted
  • Status changed from closed to reopened

Stephen Roberts reckons we should pursue the idea about storing triangle indices rather than vertex coordinates. I think so too.

comment:12 Changed 16 years ago by duncan

  • Priority changed from high to normal

Here's an algorithm suggested by Steve; for subcell in cell.children: subcell.triangles = []

for triangle in cell.triangles: for subcell in cell.children: if intersection(triangle, subcell): subcell.triangles.append(triangle)

The aim of the algorithm is to negate the need to backtrack up the search branch and hence improve the timings for fitting.

I did some timings of the Cairns demo;

Cairns timings, using the last triangle. Fitting the elevation data took 641.07 seconds search_one_cell_time 172.817859888 backtrack_time 70.6020908356 build_quadtree_time 123.701570034

So backtracking is not a major time component, in this example. I haven't found an example where it is a major component.

The quad trees are used in interpolation and fitting. When interpolating we sometimes look at 80 points on a 250,000 triangle mesh. An algorithm that speeds up fitting by reducing the cell search time and increasing the quad tree build time can slow interpolation. So when considering a change, fitting and interpolation need to be considered. My guess is the algorithm described above would take more time in building the quad tree than the current algorithm.

Since the main aim is to speed things up, here are some other options;

  • Investigate replacing sparce.py. A lot of time can be spent in here and there could be a faster alternative, eg scipy.
  • The last triangle where the last point was found is currently checked, before searching using the quad tree. Expand this to include the last triangles neighbours.

Iffy options

  • for interpolating when there is a large number of triangles for each point
  • don’t use quad tree. Sort x and y triangle centroid values find the nearest x and y to the point. Find the first centroid that has the nearest x and y. search that triangle and its neighbours.
  • store quad tree info in the sww file. - if this is done lets store triangle id in each cell.

comment:13 Changed 16 years ago by duncan

Two positvies for the All triangle ids for each cell algorithm;

  • The backtracking code needs to keep info about which cells are a cells 'brother'. This step could be removed.
  • There's a bug (Ticket #234) since points in a hole in the mesh have no triangles. With the new algorithm we can quickly realise that a point is not in a triangle and say issue a warning.

comment:14 Changed 16 years ago by anonymous

Just did a test using fitting with and without a georef. They both took about the same time. Yay.

comment:15 Changed 16 years ago by anonymous

Checking the neighbours was only slightly faster than just checking the last triangle and it significantly slowed down non-gridded fitting. Therefore it will not be implemented.

TIMING RESULTS When fitting with 319 triangles

Check last tri and neighbours		
num_of_points	is_gridded	time
957		TRUE		0.477249861
6380		TRUE		1.246427059
42108		TRUE		5.546241999
957		FALSE		0.700853825
6380		FALSE		3.216141939
42108		FALSE		19.80917001

Check last tri 		
num_of_points	is_gridded	time
957		TRUE		0.462251186
6380		TRUE		1.276449919
42108		TRUE		5.650943041
957		FALSE		0.523360014
6380		FALSE		2.057775974
42108		FALSE		12.04477811

no check last tri and neighbours		
num_of_points	is_gridded	time
957		TRUE		0.514168978
6380		TRUE		2.016870975
42108		TRUE		11.87840199
957		FALSE		0.517258167
6380		FALSE		2.044245958
42108		FALSE		11.79303098

comment:16 Changed 16 years ago by ole

  • Cc rwilson added; nick removed
  • Owner changed from duncan to ole
  • Status changed from reopened to new

comment:17 Changed 15 years ago by nariman

  • Owner changed from ole to steve

comment:18 Changed 14 years ago by hudson

  • Resolution set to fixed
  • Status changed from new to closed

Please retest with ANUGA 1.2 - there should be a ~30% speed increase in fitting from an improved quadtree algorithm. Also, please see the notes on the wiki for using caching or pickle/unpickle to backup the state on large runs.

Note: See TracTickets for help on using tickets.