Opened 17 years ago

Closed 17 years ago

#141 closed defect (fixed)

Problem with parallel code in Field_boundary (Interpolation_function)

Reported by: nick Owned by: duncan
Priority: high Milestone:
Component: Functionality and features Version: 1.0
Severity: major Keywords:
Cc:

Description

I have run similar code in parallel on both Tornado and cyclone and they die at the same place.

J:\inundation\data\western_australia\dampier_tsunami_scenario_2006\anuga\outputs\20070419_161713_run This is the tornado run on 8 cpus, using "mpirun c40-47 python run_dampier.py" All cpus fail at the same point.


starting to create boundary conditions Setup initial conditions Start Set quantity Caching: looking for cached files /d/cit/1/cit/unixhome/nbartzis/.python_cache/_fit_to_mesh[4973251167741820141]_{Result,Args,Admin}.z +---------------------------------------------------------- | Thu Apr 19 16:22:11 2007. Caching statistics (retrieving) +---------------------------------------------------------- | Function: _fit_to_mesh | Arguments: (Array: (171482, 2), Array: (342366, 3), '/d/xrd/gem/5/nhi/inundation/data/western_australia/dampier_tsunami_scenario_2006/anuga/topographies/dampier_combined_elevation.txt') | Keyword Args: {'point_attributes': None, 'use_cache': True, 'attribute_name': None, 'verbose': True, 'max_read_lines': 500, 'acceptable_overshoot': 1.01, 'mesh_origin': None, 'alpha': 0.1, 'data_origin': None} | CPU time: 66649.26 seconds | Loading time: 0.1 seconds | Time saved: 66649.16 seconds | | Caching dir: /d/cit/1/cit/unixhome/nbartzis/.python_cache/ | Result file: _fit_to_mesh[4973251167741820141]_Result (1371910 bytes) | Args file: _fit_to_mesh[4973251167741820141]_Args (10960903 bytes) | Admin file: _fit_to_mesh[4973251167741820141]_Admin (1620 bytes) | | No dependencies +----------------------------------------------------------

Finished Set quantity Subdivide mesh Build submeshes There are 323 ghost nodes and 630 ghost triangles on proc 0 There are 201 ghost nodes and 394 ghost triangles on proc 1 There are 179 ghost nodes and 358 ghost triangles on proc 2 There are 312 ghost nodes and 636 ghost triangles on proc 3 There are 141 ghost nodes and 271 ghost triangles on proc 4 There are 271 ghost nodes and 562 ghost triangles on proc 5 There are 269 ghost nodes and 546 ghost triangles on proc 6 There are 277 ghost nodes and 545 ghost triangles on proc 7 Distribute submeshes process 0 sending submesh to process 1 process 0 sending submesh to process 2 process 0 sending submesh to process 3 process 0 sending submesh to process 4 process 0 sending submesh to process 5 process 0 sending submesh to process 6 process 0 sending submesh to process 7 domain id 183276205480 domain id 183276205480 Available boundary tags ghost? domain id 183276205480 Find midpoint coordinates of entire boundary Initialise file_function Caching: looking for cached files /d/cit/1/cit/unixhome/nbartzis/.python_cache/_file_function[-4478828065918996396]_{Result,Args,Admin}.z +------------------------------------------------------------- | Thu Apr 19 16:26:23 2007. Evaluating function _file_function +------------------------------------------------------------- | Argument: '/d/xrd/gem/5/nhi/inundation/data/western_australia/dampier_tsunami_scenario_2006/anuga/boundaries/o_5000_35000.sww' | Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (518, 2), 'time_thinning': 1, 'domain_starttime': 0, 'verbose': True} | Reason: No cached result +-------------------------------------------------------------

Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/dampier_tsunami_scenario_2006/anuga/boundaries/o_5000_35000.sww File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/dampier_tsunami_scenario_2006/anuga/boundaries/o_5000_35000.sww

References:

Lower left corner: [419902.228863, 7683195.214508] Start time: 5000.000000


J:\inundation\data\western_australia\dampier_tsunami_scenario_2006\anuga\outputs\20070417_021606_run

This is the cyclone run on 8 cpus, using "mpirun c20-27 python run_dampier.py". There was a print statement in the code that has been removed that wrote that the last "before interpolation function1" (i think there was two more print line before interpolation function, but don't quote me on that) One funny thing is that 4 of the processes in this model actucally pass this part of the code.


+------------------------------------------------------- | Tue Apr 17 20:52:53 2007. Caching statistics (storing) +------------------------------------------------------- | Function: _fit_to_mesh | Arguments: (Array: (171482, 2), Array: (342366, 3), '/d/xrd/gem/5/nhi/inundation/data/western_australia/dampier_tsunami_scenario_2006/anuga/topographies/dampier_combined_elevation.txt') | Keyword Args: {'point_attributes': None, 'use_cache': True, 'attribute_name': None, 'verbose': True, 'max_read_lines': 500, 'acceptable_overshoot': 1.01, 'mesh_origin': None, 'alpha': 0.1, 'data_origin': None} | Reason: No cached result | CPU time: 66649.26 seconds | Loading time: 0.18 seconds (estimated) | | Caching dir: /d/cit/1/cit/unixhome/nbartzis/.python_cache/ | Result file: _fit_to_mesh[4973251167741820141]_Result (1371910 bytes) | Args file: _fit_to_mesh[4973251167741820141]_Args (10960903 bytes) | Admin file: _fit_to_mesh[4973251167741820141]_Admin (1620 bytes) | | No dependencies +-------------------------------------------------------

Finished Set quantity Subdivide mesh Build submeshes There are 323 ghost nodes and 630 ghost triangles on proc 0 There are 201 ghost nodes and 394 ghost triangles on proc 1 There are 179 ghost nodes and 358 ghost triangles on proc 2 There are 312 ghost nodes and 636 ghost triangles on proc 3 There are 141 ghost nodes and 271 ghost triangles on proc 4 There are 271 ghost nodes and 562 ghost triangles on proc 5 There are 269 ghost nodes and 546 ghost triangles on proc 6 There are 277 ghost nodes and 545 ghost triangles on proc 7 Distribute submeshes process 0 sending submesh to process 1 process 0 sending submesh to process 2 process 0 sending submesh to process 3 process 0 sending submesh to process 4 process 0 sending submesh to process 5 process 0 sending submesh to process 6 process 0 sending submesh to process 7 domain id 183904840592 domain id 183904840592 Available boundary tags ghost? domain id 183904840592 Find midpoint coordinates of entire boundary Initialise file_function Caching: looking for cached files /d/cit/1/cit/unixhome/nbartzis/.python_cache/_file_function[-4478816493114003663]_{Result,Args,Admin}.z +------------------------------------------------------------- | Tue Apr 17 20:58:43 2007. Evaluating function _file_function +------------------------------------------------------------- | Argument: '/d/xrd/gem/5/nhi/inundation/data/western_australia/dampier_tsunami_scenario_2006/anuga/boundaries/o_5000_35000.sww' | Keyword Args: {'quantities': ['stage', 'xmomentum', 'ymomentum'], 'interpolation_points': Array: (518, 2), 'time_thinning': 12, 'domain_starttime': 0, 'verbose': True} | Reason: No cached result +------------------------------------------------------------ Reading /d/xrd/gem/5/nhi/inundation/data/western_australia/dampier_tsunami_scenario_2006/anuga/boundaries/o_5000_35000.sww File_function data obtained from: /d/xrd/gem/5/nhi/inundation/data/western_australia/dampier_tsunami_scenario_2006/anuga/boundaries/o_5000_35000.sww

References:

Lower left corner: [419902.228863, 7683195.214508] Start time: 5000.000000

before interpolation function1


i will attach the script files to this ticket

Attachments (4)

project.py (10.1 KB) - added by nick 17 years ago.
cyclone project file
run_dampier.py (7.8 KB) - added by nick 17 years ago.
cyclone run file
project.2.py (10.1 KB) - added by nick 17 years ago.
Actually cyclone project file (other one is actually the tornado file)
run_dampier.2.py (7.8 KB) - added by nick 17 years ago.
Actually cyclone run file (other one is actually the tornado file)

Download all attachments as: .zip

Change History (6)

Changed 17 years ago by nick

cyclone project file

Changed 17 years ago by nick

cyclone run file

Changed 17 years ago by nick

Actually cyclone project file (other one is actually the tornado file)

Changed 17 years ago by nick

Actually cyclone run file (other one is actually the tornado file)

comment:1 Changed 17 years ago by nick

  • Component changed from Appearance and visualisation to Functionality and features
  • Owner changed from ole to duncan
  • Priority changed from normal to high
  • Severity changed from normal to major

comment:2 Changed 17 years ago by nick

  • Resolution set to fixed
  • Status changed from new to closed

It seems that this problem was caused by mpirun having issues. because of the following 1) when Duncan ran similar code it worked on tornado 2) After Nick ran "lamwipe -v" (only because "lamhalt -vd" failed) then Nick ran "lamboot -v .machine_tornado" and then "lamclean -v". The "mpirun c40-47 python run_dampier" command worked and ANUGA successfully passed the field_boundary code

Lessons Learnt:

Minimise Killing parallel jobs and if you do run "lamclean" if this doesn't run try "lamhalt" and if that doesn't work run "lamwipe -v" then start LAM again with "lamboot -v .machine" and do a clean "lamclean -v" (this clean might not be necessary.)

Note: See TracTickets for help on using tickets.