Opened 19 years ago
Closed 17 years ago
#187 closed defect (fixed)
Problem with 400,000+ triangles and Parallel ANUGA
| Reported by: | nick | Owned by: | ole |
|---|---|---|---|
| Priority: | high | Milestone: | |
| Component: | Functionality and features | Version: | |
| Severity: | normal | Keywords: | |
| Cc: |
Description
Can someone please confirm that this is a problem, by running the following on either cyclone or tornado
"mpirun c0-3 python run_pt_hedland_urs.py" from this directory.
Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070711_014308_run_final_3.6_onslow_nbartzis
This script runs pt hedland with 465000 triangles, The problems is that the script gets to
+-------------------------------------------------------------
| Wed Jul 11 11:57:11 2007. Evaluating function _file_function
+-------------------------------------------------------------
| Argument: '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww'
| Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (1523, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True}
| Reason: No cached result
+-------------------------------------------------------------
Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
References:
Lower left corner: [576824.153048, 7736310.061742]
Start time: 5000.000000
and then hangs... forever.... (2weeks) or stops and throws no python error but prints the following.
----------------------------------------------------------------------------- One of the processes started by mpirun has exited with a nonzero exit code. This typically indicates that the process finished in error. If your process did not finish in error, be sure to include a "return 0" or "exit(0)" in your C code before exiting the application. PID 18209 failed on node n6 (192.168.255.247) due to signal 9. -----------------------------------------------------------------------------
I have run very similar code however with less triangles around 200,000 and it is fine. This can be found here
Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070704_010800_run_basic_3.6_onslow_nbartzis
I have tried to run other scenarios like onslow in parallel with over 400000 triangles, and they have failed too.
please help
Attachments (1)
Change History (4)
Changed 19 years ago by
| Attachment: | screen_output_0_4.txt added |
|---|
comment:1 Changed 19 years ago by
I ran this scenario. The node it was running on died after less than 12 hours of running
I've attached the output from processor 0. Here's the end of the file;
+-------------------------------------------------------------
| Wed Jul 11 11:57:11 2007. Evaluating function _file_function
+-------------------------------------------------------------
| Argument: '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww'
| Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (1523, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True}
| Reason: No cached result
+-------------------------------------------------------------
Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
References:
Lower left corner: [576824.153048, 7736310.061742]
Start time: 5000.000000
Here's the end of the output file from process 3. The other processors failed at this point.
triangles 465557
process 3 receiving submesh from process 0
domain id 183317348152
Available boundary tags ['ghost', 'back']
domain id 183317348152
Find midpoint coordinates of entire boundary
Initialise file_function
Caching: looking for cached files /d/cit/1/cit/unixhome/nbartzis/.python_cache/_file_function[-7514477565657141842]_{Result,Args,Admin}.z
Caching: Dependencies are ['/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww']
+-------------------------------------------------------------
| Wed Jul 11 11:56:45 2007. Evaluating function _file_function
+-------------------------------------------------------------
| Argument: '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww'
| Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (906, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True}
| Reason: No cached result
+-------------------------------------------------------------
Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
References:
Lower left corner: [576824.153048, 7736310.061742]
Start time: 5000.000000
Building interpolation matrix from source mesh (685 vertices, 1142 triangles)
FitInterpolate: Building mesh
This seems to be failing at the same point it failed for Nick.
Nick, can this run in series?
comment:2 Changed 19 years ago by
This run did work in series and you can find the results here
Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070710_225645_run_final_3.6_onslow_nbartzis
It does look like it retrieved the information about fit boundary from cache... something to think about
comment:3 Changed 17 years ago by
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |

processor 0 output