Opened 18 years ago
Last modified 14 years ago
#272 new enhancement
Write blocking csv2pts script
| Reported by: | ole | Owned by: | nariman |
|---|---|---|---|
| Priority: | low | Milestone: | |
| Component: | Efficiency and optimisation | Version: | |
| Severity: | normal | Keywords: | |
| Cc: |
Description
ted Rigby has shown that ANUGA isn't performing very well if one wants to convert 18 million points in a csv file to the pts netcdf format. Currently, we do this by creating a Geospatial data object from the csv file and then exporting it to the pts format. The problem is that all data is stored in memory and this causes thrashing for large datasets. This problem is holding Ted back in running a validation against the Macquarie-Rivulaet catchment.
One solution would be to make use of blocking so that the csv file isn't read until the export code is run - and then only in blocks.
Another, and perhaps simpler, approach would be to write a conversion script, csv2pts, which does the conversion without ever having to store all data in memory.
Change History (7)
comment:1 Changed 18 years ago by
comment:2 Changed 18 years ago by
| Owner: | changed from duncan to ole |
|---|
comment:3 Changed 18 years ago by
| Status: | new → assigned |
|---|
comment:4 Changed 18 years ago by
Consider changing export_points_file/_write_pts_file in geospatial so if a csv/txt file is loaded using blocking calling export_points_file writes it using blocking.
comment:5 Changed 18 years ago by
| Owner: | changed from ole to duncan |
|---|---|
| Priority: | high → normal |
| Status: | assigned → new |
comment:6 Changed 16 years ago by
| Owner: | changed from duncan to nariman |
|---|
comment:7 Changed 14 years ago by
| Priority: | normal → low |
|---|

Btw - Ted asked why a hardwired blocking value of 1e30 is passed into the reader within on of the methods in Geospatial data.