Opened 17 years ago
Closed 15 years ago
#187 closed defect (fixed)
Problem with 400,000+ triangles and Parallel ANUGA
Reported by: | nick | Owned by: | ole |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Functionality and features | Version: | |
Severity: | normal | Keywords: | |
Cc: |
Description
Can someone please confirm that this is a problem, by running the following on either cyclone or tornado
"mpirun c0-3 python run_pt_hedland_urs.py" from this directory.
Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070711_014308_run_final_3.6_onslow_nbartzis
This script runs pt hedland with 465000 triangles, The problems is that the script gets to
+------------------------------------------------------------- | Wed Jul 11 11:57:11 2007. Evaluating function _file_function +------------------------------------------------------------- | Argument: '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww' | Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (1523, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True} | Reason: No cached result +------------------------------------------------------------- Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww References: Lower left corner: [576824.153048, 7736310.061742] Start time: 5000.000000
and then hangs... forever.... (2weeks) or stops and throws no python error but prints the following.
----------------------------------------------------------------------------- One of the processes started by mpirun has exited with a nonzero exit code. This typically indicates that the process finished in error. If your process did not finish in error, be sure to include a "return 0" or "exit(0)" in your C code before exiting the application. PID 18209 failed on node n6 (192.168.255.247) due to signal 9. -----------------------------------------------------------------------------
I have run very similar code however with less triangles around 200,000 and it is fine. This can be found here
Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070704_010800_run_basic_3.6_onslow_nbartzis
I have tried to run other scenarios like onslow in parallel with over 400000 triangles, and they have failed too.
please help
Attachments (1)
Change History (4)
Changed 17 years ago by duncan
comment:1 Changed 17 years ago by duncan
I ran this scenario. The node it was running on died after less than 12 hours of running
I've attached the output from processor 0. Here's the end of the file;
+------------------------------------------------------------- | Wed Jul 11 11:57:11 2007. Evaluating function _file_function +------------------------------------------------------------- | Argument: '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww' | Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (1523, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True} | Reason: No cached result +------------------------------------------------------------- Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww References: Lower left corner: [576824.153048, 7736310.061742] Start time: 5000.000000
Here's the end of the output file from process 3. The other processors failed at this point.
triangles 465557 process 3 receiving submesh from process 0 domain id 183317348152 Available boundary tags ['ghost', 'back'] domain id 183317348152 Find midpoint coordinates of entire boundary Initialise file_function Caching: looking for cached files /d/cit/1/cit/unixhome/nbartzis/.python_cache/_file_function[-7514477565657141842]_{Result,Args,Admin}.z Caching: Dependencies are ['/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww'] +------------------------------------------------------------- | Wed Jul 11 11:56:45 2007. Evaluating function _file_function +------------------------------------------------------------- | Argument: '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww' | Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (906, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True} | Reason: No cached result +------------------------------------------------------------- Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww References: Lower left corner: [576824.153048, 7736310.061742] Start time: 5000.000000 Building interpolation matrix from source mesh (685 vertices, 1142 triangles) FitInterpolate: Building mesh
This seems to be failing at the same point it failed for Nick.
Nick, can this run in series?
comment:2 Changed 17 years ago by nick
This run did work in series and you can find the results here
Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070710_225645_run_final_3.6_onslow_nbartzis
It does look like it retrieved the information about fit boundary from cache... something to think about
comment:3 Changed 15 years ago by nariman
- Resolution set to fixed
- Status changed from new to closed
processor 0 output