Opened 10 years ago
Closed 7 years ago
#178 closed enhancement (fixed)
Time to load and fit mesh file (domain.set_quantity) is the slowest part for large parallel model runs
Reported by: | nick | Owned by: | steve |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Efficiency and optimisation | Version: | |
Severity: | normal | Keywords: | |
Cc: | duncan, sexton, rwilson |
Description (last modified by ole)
Here are the some stats and times for different parts of ANUGA for 3 similar models scenarios except for the number of triangles
J:\inundation\data\western_australia\broome_tsunami_scenario_2006\anuga\outputs\20070518_060438_run_final_0_dampier_nbartzis 380000 triangles, on Tornado bathy file load 58000 sec (15hrs) load and fit boundary condition 35000 (10hrs) evolution 25000 (7hrs) J:\inundation\data\western_australia\broome_tsunami_scenario_2006\anuga\outputs\20070621_235838_run_final_0_onslow_nbartzis 430000 triangles, on cyclone bathy file load 160000 sec (44hrs) load and fit boundary condition 56000 (15hrs) evolution 80000 (22hrs) PARALLEL using 4 cpus J:\inundation\data\western_australia\broome_tsunami_scenario_2006\anuga\outputs\20070615_062002_run_final_0_onslow_nbartzis 590000 triangles, on cyclone bathy file load 250000 sec (70hrs) load and fit boundary condition 25000 (7hrs) evolution 270000 (75hrs) NOT parallel, so can expect around 25 hours with parallel
I think the next effort to increase speed in anuga should be focused on loading and fiting the bathy file to the mesh (domain.set_quantity). Even if this process cached more regularly so when it is half way through and there is a failure it doesn't need to run the whole thing again.
Also the length of time a model needs to run is not the biggest issue really, but greatly increases the risk of the following things we can't control. Such as network issue, file access problems, node failures, custer node reboots, gniess and perlite system reboots and rifts in the space-time continuum.
Change History (18)
comment:1 Changed 10 years ago by ole
- Owner changed from ole to duncan
comment:2 Changed 10 years ago by nick
comment:3 Changed 10 years ago by ole
Added profiling code towards this ticket in changeset:4578
comment:4 Changed 10 years ago by ole
- Description modified (diff)
comment:5 Changed 10 years ago by ole
- Cc duncan nick added
Result from profiling (Ole's home computer) Clearly _search_triangles_of_vertices is the dominant contributor to the total running time with 2292s out of the 2645.695 CPU seconds used by set_quantity.
C:\work\inundation\anuga_validation\performance_tests\okushiri>python run_okushiri_profile_fitting.py Caching: looking for cached files c:\.python_cache\_pmesh_to_domain[-1292443611]_{Result,Args,Admin}.z Caching: Dependencies are ['Benchmark_2.msh'] +---------------------------------------------------------- | Tue Jul 03 05:30:41 2007. Caching statistics (retrieving) +---------------------------------------------------------- | Function: _pmesh_to_domain | Arguments: ('Benchmark_2.msh', None) | CPU time: 1.64 seconds | Loading time: 0.03 seconds | Time saved: 1.61 seconds | | Caching dir: c:\.python_cache\ | Result file: _pmesh_to_domain[-1292443611]_Result.z (603089 bytes, compressed) | Args file: _pmesh_to_domain[-1292443611]_Args.z (40 bytes, compressed) | Admin file: _pmesh_to_domain[-1292443611]_Admin.z (432 bytes, compressed) | | Dependencies: | Benchmark_2.msh: Wed May 30 04:01:51 2007 1341952 bytes +---------------------------------------------------------- General_mesh: Building basic mesh structure General_mesh: Computing areas, normals and edgelenghts (0/41457) (4146/41457) (8292/41457) (12438/41457) (16584/41457) (20730/41457) (24876/41457) (29022/41457) (33168/41457) (37314/41457) Building inverted triangle structure Initialising mesh Mesh: Computing centroids and radii (0/41457) (4146/41457) (8292/41457) (12438/41457) (16584/41457) (20730/41457) (24876/41457) (29022/41457) (33168/41457) (37314/41457) Mesh: Building neigbour structure Mesh: Building surrogate neigbour structure Mesh: Building boundary dictionary Mesh: Building tagged elements dictionary Mesh: Done Initialising Domain Domain: Set up communication buffers (parallel) Domain: Initialising quantity values Domain: Done ------------------------------------------------ Mesh statistics: Number of triangles = 41457 Extent [m]: x in [0.000000, 5.448000] y in [0.000000, 3.402000] Areas [m^2]: A in [0.000009, 0.099963] number of distinct areas: 41457 Histogram: [0.000009, 0.010004[: 41225 [0.010004, 0.020000[: 46 [0.020000, 0.029995[: 25 [0.029995, 0.039991[: 16 [0.039991, 0.049986[: 34 [0.049986, 0.059981[: 31 [0.059981, 0.069977[: 34 [0.069977, 0.079972[: 22 [0.079972, 0.089968[: 18 [0.089968, 0.099963]: 6 Percentiles (10%): 4145 triangles in [0.000009, 0.000016] 4145 triangles in [0.000016, 0.000018] 4145 triangles in [0.000018, 0.000021] 4145 triangles in [0.000021, 0.000026] 4145 triangles in [0.000026, 0.000147] 4145 triangles in [0.000147, 0.000188] 4145 triangles in [0.000188, 0.000233] 4145 triangles in [0.000233, 0.000278] 4145 triangles in [0.000278, 0.000362] 4145 triangles in [0.000362, 0.089004] 7 triangles in [0.089004, 0.099963] Boundary: Number of boundary segments == 97 Boundary tags == ['wall', 'wave'] ------------------------------------------------ FitInterpolate: Building mesh FitInterpolate: Building quad tree Building smoothing matrix Geospatial data created from file: Benchmark_2_Bathymetry.pts Data will be loaded blockwise on demand Got 1 variables: ['elevation'] Reading 95892 points (in ~191 blocks) from file Benchmark_2_Bathymetry.pts. Each block consists of 500 data points Reading block 0 (points 0 to 500) out of 191 Processing Block 0 Reading block 19 (points 9500 to 10000) out of 191 Reading block 38 (points 19000 to 19500) out of 191 Reading block 57 (points 28500 to 29000) out of 191 Reading block 76 (points 38000 to 38500) out of 191 Reading block 95 (points 47500 to 48000) out of 191 Reading block 115 (points 57500 to 58000) out of 191 Reading block 134 (points 67000 to 67500) out of 191 Reading block 153 (points 76500 to 77000) out of 191 Reading block 172 (points 86000 to 86500) out of 191 That took 5849.25 seconds Tue Jul 03 07:08:32 2007 profile.dat 137880506 function calls (137516717 primitive calls) in 2645.695 CPU seconds Ordered by: cumulative time List reduced from 112 to 30 due to restriction <30> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 2645.695 2645.695 profile:0(domain.set_quantity('elevation', filename=project.bathymetry_filename, alpha=0.02, verbose=True)) 1 0.000 0.000 2645.694 2645.694 <string>:1(?) 1 0.000 0.000 2645.694 2645.694 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\doma in.py:303(set_quantity) 1 0.000 0.000 2645.694 2645.694 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\quan tity.py:171(set_values) 1 0.000 0.000 2645.548 2645.548 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\quan tity.py:727(set_values_from_file) 1 0.229 0.229 2642.903 2642.903 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:435(fit_ to_mesh) 1 0.000 0.000 2642.675 2642.675 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:481(_fit _to_mesh) 1 0.017 0.017 2447.031 2447.031 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:304(fit) 192 0.009 0.000 2426.180 12.636 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:395(buil d_fit_subset) 192 19.022 0.099 2426.164 12.636 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:216(_bui ld_matrix_AtA_Atz) 95160 3.655 0.000 2370.224 0.025 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\search_function s.py:12(search_tree_of_vertices) 95574 1075.162 0.011 2292.396 0.024 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\search_function s.py:52(_search_triangles_of_vertices) 22888302 352.049 0.000 474.539 0.000 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\gene ral_mesh.py:285(get_vertex_coordinate) 45454319 386.493 0.000 386.493 0.000 C:\Python23\lib\site-packages\Numeric\Numeric.py:330(dot) 1 0.000 0.000 195.643 195.643 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\fit.py:50(__ini t__) 22726683 184.353 0.000 184.353 0.000 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\gene ral_mesh.py:234(get_normal) 1 0.001 0.001 178.051 178.051 c:\work\inundation\anuga_core\source\anuga\fit_interpolate\general_fit_int erpolate.py:42(__init__) 1328995 53.903 0.000 143.153 0.000 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\gene ral_mesh.py:381(get_triangles_and_vertices_per_node) 1 14.628 14.628 122.676 122.676 C:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\neig hbour_mesh.py:608(check_integrity) 22888495 118.443 0.000 122.515 0.000 c:\work\inundation\anuga_core\source\anuga\abstract_2d_finite_volumes\gene ral_mesh.py:264(get_vertex_coordinates) 2075222 102.748 0.000 102.748 0.000 C:\Python23\lib\site-packages\Numeric\Numeric.py:587(sum) 124371 4.698 0.000 76.386 0.001 c:\work\inundation\anuga_core\source\anuga\utilities\numerical_tools.py:11 7(anglediff) 248742 32.413 0.000 71.688 0.000 c:\work\inundation\anuga_core\source\anuga\utilities\numerical_tools.py:78 (angle) 95160 12.566 0.000 69.703 0.001 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:74(search) 338747/95160 28.135 0.000 49.270 0.001 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:95(search_ branch) 7824303 39.867 0.000 39.867 0.000 c:\work\inundation\anuga_core\source\anuga\utilities\numerical_tools.py:30 9(get_machine_precision) 1 0.019 0.019 31.592 31.592 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:382(build_qua dtree) 124274 2.870 0.000 31.319 0.000 C:\Python23\lib\site-packages\Numeric\Numeric.py:350(array_str) 2369/1 7.320 0.003 30.160 30.160 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:245(split) 1875470 24.024 0.000 28.852 0.000 c:\work\inundation\anuga_core\source\anuga\utilities\quad.py:127(contains) <pstats.Stats instance at 0x025AA058>
comment:6 Changed 10 years ago by ole
- Owner changed from duncan to ole
I'll have a stab at this since I have started the profiling and gotten my head around the problem.
comment:7 Changed 10 years ago by ole
- Status changed from new to assigned
comment:8 Changed 10 years ago by ole
- Owner changed from ole to duncan
- Status changed from assigned to new
Duncan and Ole discussed new algorithm based on
- Centroids stored in quad tree instead of vertices: fewer points, no triang repeat
- Search triangle neigbours rather than back tracking
- Deeper quad trees
This should speeed things up
comment:9 Changed 10 years ago by ole
- Cc sexton added
Here's a suggestion for an improved algorithm:
Store triangle indices in quad cells instead of storing points (vertices or centroids). Each triangle may belong to more than one quad cell reflecting the fact that triangles will be split across the rectilinear cell boundaries.
The search for which triangle that contains data point x will then become much simpler than before: For each data point x, locate the cell where x belongs and go through the list of triangles in that cell. No search for neighbours and no backtracking will be necessary. The size of the tree will also be smaller than currently.
The build becomes slightly, but not much, more complex. Here's a suggestion for an algorithm: (1) For each triangle vertex, find the appropriate quad cell. This the same as what we have now. (2) Then add to the cell ids of triangles that are using the found vertex. (3) After this has been done, go through all triangles in each cell and do the following: If all three vertices (and the centroid) belongs to the one cell all is well and good (the triangle is fully contained in one cell). If not, the triangle is split among cells, so we add the neighbours of the split triangle to the cell.
I think this will build the desired tree - in any case this is something that can be tested thoroughly since the build algorithm is independent on datapoints.
If we find a pathological example of triangles that won't be registered correctly, we can concentrate on fixing it in the build step rather than the search.
Please let me know what you think! Cheers Ole
comment:10 Changed 10 years ago by duncan
- Resolution set to fixed
- Status changed from new to closed
The fitting code has been speed up by a factor of 4.
There were 3 main speed ups;
- check if a point is in the extent of a triagle before calculating sigmas.
- optimising the max points per cell value.
- Calculating the triangle vert and norm info only once per cell and storing the info for latter use.
Here are some suggestions for future speed up;
- Use the point in polygon c code to check which triangle the point is in, the find a new optimum cell size.
- Use the mesh structure already created for domain.
- Add a flag along the lines of point_data_sorted and if this is true check if the point data is in the same triangle/cell as the last point.
- Do the sigma calculations in c.
comment:11 Changed 10 years ago by ole
- Resolution fixed deleted
- Status changed from closed to reopened
Stephen Roberts reckons we should pursue the idea about storing triangle indices rather than vertex coordinates. I think so too.
comment:12 Changed 10 years ago by duncan
- Priority changed from high to normal
Here's an algorithm suggested by Steve; for subcell in cell.children: subcell.triangles = []
for triangle in cell.triangles: for subcell in cell.children: if intersection(triangle, subcell): subcell.triangles.append(triangle)
The aim of the algorithm is to negate the need to backtrack up the search branch and hence improve the timings for fitting.
I did some timings of the Cairns demo;
Cairns timings, using the last triangle. Fitting the elevation data took 641.07 seconds search_one_cell_time 172.817859888 backtrack_time 70.6020908356 build_quadtree_time 123.701570034
So backtracking is not a major time component, in this example. I haven't found an example where it is a major component.
The quad trees are used in interpolation and fitting. When interpolating we sometimes look at 80 points on a 250,000 triangle mesh. An algorithm that speeds up fitting by reducing the cell search time and increasing the quad tree build time can slow interpolation. So when considering a change, fitting and interpolation need to be considered. My guess is the algorithm described above would take more time in building the quad tree than the current algorithm.
Since the main aim is to speed things up, here are some other options;
- Investigate replacing sparce.py. A lot of time can be spent in here and there could be a faster alternative, eg scipy.
- The last triangle where the last point was found is currently checked, before searching using the quad tree. Expand this to include the last triangles neighbours.
Iffy options
- for interpolating when there is a large number of triangles for each point
- don’t use quad tree. Sort x and y triangle centroid values find the nearest x and y to the point. Find the first centroid that has the nearest x and y. search that triangle and its neighbours.
- store quad tree info in the sww file. - if this is done lets store triangle id in each cell.
comment:13 Changed 10 years ago by duncan
Two positvies for the All triangle ids for each cell algorithm;
- The backtracking code needs to keep info about which cells are a cells 'brother'. This step could be removed.
- There's a bug (Ticket #234) since points in a hole in the mesh have no triangles. With the new algorithm we can quickly realise that a point is not in a triangle and say issue a warning.
comment:14 Changed 10 years ago by anonymous
Just did a test using fitting with and without a georef. They both took about the same time. Yay.
comment:15 Changed 10 years ago by anonymous
Checking the neighbours was only slightly faster than just checking the last triangle and it significantly slowed down non-gridded fitting. Therefore it will not be implemented.
TIMING RESULTS When fitting with 319 triangles
Check last tri and neighbours num_of_points is_gridded time 957 TRUE 0.477249861 6380 TRUE 1.246427059 42108 TRUE 5.546241999 957 FALSE 0.700853825 6380 FALSE 3.216141939 42108 FALSE 19.80917001 Check last tri num_of_points is_gridded time 957 TRUE 0.462251186 6380 TRUE 1.276449919 42108 TRUE 5.650943041 957 FALSE 0.523360014 6380 FALSE 2.057775974 42108 FALSE 12.04477811 no check last tri and neighbours num_of_points is_gridded time 957 TRUE 0.514168978 6380 TRUE 2.016870975 42108 TRUE 11.87840199 957 FALSE 0.517258167 6380 FALSE 2.044245958 42108 FALSE 11.79303098
comment:16 Changed 9 years ago by ole
- Cc rwilson added; nick removed
- Owner changed from duncan to ole
- Status changed from reopened to new
comment:17 Changed 8 years ago by nariman
- Owner changed from ole to steve
comment:18 Changed 7 years ago by hudson
- Resolution set to fixed
- Status changed from new to closed
Please retest with ANUGA 1.2 - there should be a ~30% speed increase in fitting from an improved quadtree algorithm. Also, please see the notes on the wiki for using caching or pickle/unpickle to backup the state on large runs.
the bathy file used in all is J:\inundation\data\western_australia\broome_tsunami_scenario_2006\anuga\topographies\broome_combined_elevation_unclipped1.txt and it has about 15,000,000 points