Opened 17 years ago

Closed 15 years ago

#187 closed defect (fixed)

Problem with 400,000+ triangles and Parallel ANUGA

Reported by: nick Owned by: ole
Priority: high Milestone:
Component: Functionality and features Version:
Severity: normal Keywords:
Cc:

Description

Can someone please confirm that this is a problem, by running the following on either cyclone or tornado

"mpirun c0-3 python run_pt_hedland_urs.py" from this directory.


Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070711_014308_run_final_3.6_onslow_nbartzis

This script runs pt hedland with 465000 triangles, The problems is that the script gets to

+-------------------------------------------------------------
| Wed Jul 11 11:57:11 2007. Evaluating function _file_function
+-------------------------------------------------------------
| Argument:     '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww'
| Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (1523, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True}
| Reason:       No cached result
+-------------------------------------------------------------

Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
  References:
    Lower left corner: [576824.153048, 7736310.061742]
    Start time:   5000.000000

and then hangs... forever.... (2weeks) or stops and throws no python error but prints the following.

-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 18209 failed on node n6 (192.168.255.247) due to signal 9.
-----------------------------------------------------------------------------

I have run very similar code however with less triangles around 200,000 and it is fine. This can be found here


Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070704_010800_run_basic_3.6_onslow_nbartzis

I have tried to run other scenarios like onslow in parallel with over 400000 triangles, and they have failed too.

please help

Attachments (1)

screen_output_0_4.txt (6.2 KB) - added by duncan 17 years ago.
processor 0 output

Download all attachments as: .zip

Change History (4)

Changed 17 years ago by duncan

processor 0 output

comment:1 Changed 17 years ago by duncan

I ran this scenario. The node it was running on died after less than 12 hours of running

I've attached the output from processor 0. Here's the end of the file;

+-------------------------------------------------------------
| Wed Jul 11 11:57:11 2007. Evaluating function _file_function
+-------------------------------------------------------------
| Argument:     '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww'
| Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (1523, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True}
| Reason:       No cached result
+-------------------------------------------------------------

Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
  References:
    Lower left corner: [576824.153048, 7736310.061742]
    Start time:   5000.000000

Here's the end of the output file from process 3. The other processors failed at this point.

triangles 465557
process 3 receiving submesh from process 0
domain id 183317348152
Available boundary tags ['ghost', 'back']
domain id 183317348152
Find midpoint coordinates of entire boundary
Initialise file_function
Caching: looking for cached files /d/cit/1/cit/unixhome/nbartzis/.python_cache/_file_function[-7514477565657141842]_{Result,Args,Admin}.z
Caching: Dependencies are ['/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww']
+-------------------------------------------------------------
| Wed Jul 11 11:56:45 2007. Evaluating function _file_function
+-------------------------------------------------------------
| Argument:     '/d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww'
| Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (906, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True}
| Reason:       No cached result
+-------------------------------------------------------------

Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/pt_hedland_tsunami_scenario_2006/anuga/boundaries/urs/onslow_hedland_broome/1_10000/pt_hedland_3859_16052007.sww
  References:
    Lower left corner: [576824.153048, 7736310.061742]
    Start time:   5000.000000
Building interpolation matrix from source mesh (685 vertices, 1142 triangles)
FitInterpolate: Building mesh

This seems to be failing at the same point it failed for Nick.

Nick, can this run in series?

comment:2 Changed 17 years ago by nick

This run did work in series and you can find the results here


Gneiss\gem5nhi\inundation\data\western_australia\pt_hedland_tsunami_scenario_2006\anuga\outputs\20070710_225645_run_final_3.6_onslow_nbartzis

It does look like it retrieved the information about fit boundary from cache... something to think about

comment:3 Changed 15 years ago by nariman

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.