source: inundation/wiki/least_squares_redesign.txt @ 2225

Last change on this file since 2225 was 2183, checked in by duncan, 19 years ago

thoughts

File size: 7.1 KB
RevLine 
[1899]1Project Plan
2
3By Duncan Gray
4
5INTRODUCTION
6
7 Software, called least squares, has been developed to implement a
[1947]8penalised least-squares fit and associated interpolations. It is
9currently in least_squares.py.  Steven Roberts outlined the least
10squares algorithm used. The code is currently not meeting user
11requirements with regards to speed and memory use.  Additional
12functionality is also needed.
[1899]13
[1947]14
[1899]15SCOPE
16
[1947]17This is looking at redesigning the least_squares module, excluding
18interpolate_function and pts2rectangle.  (Other modules will
19be modified, based on these changes.)
20
21
[1899]22LIFE-CYCLE
23
24Use a staged delivery life-cycle (see rapid development), with the
25proviso that additional functionality to supporting objects, such as
26build_quadtree in quad.py can be worked on after requirements have
27been determined.  Also, the current least squares may be tinkered with
[1915]28to check the feasibility of concepts and general maintanance.  The
29main thrust will be refactoring though.
[1899]30
31
32______________________________________________________________________
33Least Squares Refactoring Software Requirements Specification
34
35By Duncan Gray
36
37
38INTRODUCTION
39
[1948]40This document specifies the requirements of the redesign of least squares.
[1899]41
42DEFINITIONS
43
[1915]44 
[1899]45REQUIREMENT SECTION
46
47Introduction
48
49Assume that all the requirements of the current least squares code will
50stay the same, unless stated.  Note implementation and names used are
51not regarded as requirements. 
52
53
[1947]54Requirements - Fitting
[1899]55
[1947]56When in verbose mode display how many points are in the
57mesh and how many are outside its boundary. Also display the max # of
58points/ triangle and the minimum number of points.  If this operation
59takes a lot of time, let it be optional.
[1915]60
[1947]61Raise an exception if no points are inside the mesh.
[1915]62
[1918]63Fitting 5 million points to a one million triangles mesh will not cause a
[1915]64memory error.
[1899]65
[2048]66Remove least_squares commandline.  This means it will be called using
67python files, which should increase accountability.
[1947]68
69Points do not have to be supplied all at once.  They can be supplied
70as blocks. (This will improve memory use)
71
72Requirements - Interpolation
73
74Return information on which points were outside the mesh (in such a
75way that it can not be interpreted as real data).  If attribute
76information for points outside the mesh is returned, that can only be
77a float, let the user set the default value that is returned for
[1948]78points outside the mesh. if the value can be set to NAN, then set it
[2048]79to NAN, if no default is given.
[1947]80
[2048]81
[1915]82Reduce the time spent building matrix A. (not a testable requirement)
[1899]83
[2048]84Points do not have to be supplied all at once.  They can be supplied
85as blocks. (This will improve memory use)
86
[1947]87A target to aim for
88Algorithmic complexity of build equations should be linear in number
89of data points and logorithmic in the number of vertices.
[1899]90
[1947]91Constraint
92Memory consumption of least squares equations must be indepenent on
93the number of datapoints. (Except for the memory used to store the points.)
[1899]94___________________________________________________________________________
95Least Squares System Design Specification
96
[1915]97By Duncan Gray - lots of good ideas by Ole
[1899]98
99
100INTRODUCTION
101
102This specifies design choices for the least squares code.
103
104DESIGN IDEAS, to meet requirements
105*To save memory;
106When interpolating, don't build AtA, B and D. (This has been implemented)
[1947]107When fitting, don't build A. - Work with AtA and Atz. Build Atz
[1948]108directly, rather than through At*z - use code similar to AtA.
[1899]109
110
111* To save memory
112
[1915]113This is for the layer interpolation and fitting with IO: Use blocking
114when interpolating/fitting a large number of points (how large?  Can the
[1947]115program check the available memory and work out a good block size?
116Maybe have it user specified.  Do have the option for user specified)
[1915]117
[1947]118The code has to be refactored so fitting can handle blocked point
[1915]119info.  Need a building phase and a solve phase. 
120
[2048]121*Returning information on points outside the mesh.
122Do not find that these points are outside the mesh by searching the
123binary tree.  Do it by using polygon.  (data_manager.sww2dem does not
124use the expanded search, since it needs points outside the mesh, and
125setting precrop to True means these points are not returned.  Using a
126precrop of true and expand search true grinds the system to a halt.)
[2067]127___________________
128Here’s the snippet (from data_manager.py) that will identify points outside the mesh, in case you can use it for the interpolate methodology:
129 
130    P = interp.mesh.get_boundary_polygon()
131    outside_indices = outside_polygon(grid_points, P, closed=True)
132    for i in outside_indices:
133        grid_values[i] = NODATA_value       
134Cheers, O
135__________________
[1915]136
[1899]137*To reduce the time spent building matrix A.
[1918]138
[1899]139Change the expansion algorithm, when determining what triangle a point
[1918]140 is in, when none of the vertices are in the cell the point is in.
141
142Options (still thinking about this one)
143*  It should look in the cell of the point and then all adjacent
[1947]144cells. If it isn't there then ... expand more.
145- Building the structure to keep this info could be prohibitive.
[1899]146
[1947]147*A feature that could work for the current implementation is to keep a
148 list of triangles that have been checked, so each incorrect triangle
149 isn't checked three times, if all three verts are in the cell being checked.
[1918]150
[1915]151Another speed up option is to write the bottlenecks in C. Do it if necessary.
[1899]152
[1947]153* To give warnings.
154Use warnings module?  Check this out.
[1915]155
[1899]156DESIGN IDEAS, refactoring
157
158Remove georeferencing from build_interpolation_matrix_A. Change the
159point co-ords to conform to the mesh co-ords early in the code.
160
[1915]161Currently there is one class, called Interpolation, that holds the
162matrix data and methods for fitting and interpolation.  Break this
163into two classes, one for fittting, the other for interpolation. Have
164processes common to both classes in functions.     
[1899]165
[1915]166Have least squares as two stand alone packages.  One that has the guts
[1947]167of LS (e.g. the two new classes).  One that has the IO API's
168(functions that load and write files and do the blocking).  The guts
169package will be dependent of the general mesh package.  The IO API
170will rely on an IO module.
[1915]171
172HOW WILL IMPLEMENTATION OCCUR?
173Code a new least squares module in a seperate directory.  Heavily cut
174and paste from the old least squares. Implement memory and time tests
[1947]175on the old least squares (and new LS) before doing any changes. 
176
177For the speed up, discuss/ get go ahead from Ole before implementing
178any major changes, since there is no obvious way forward.
[2183]179
180EXTRA IDEAS
181- Reading in multiple sww files.  If data is broken into multiple sww
182files, eg 0 - 30 sec 30 - 60 sec, how do we take this into account?
183In the old version it's in Interpolate_sww, when
184x, y, volumes, time, quantity = self.read_sww(...) occurs
185
186The time vector is incresed and the quantity matrix is increased.
187
188If the sww files are being split up to keep them below 2 GB in size
189the quantity matrix could be huge.
190
191How about a 'near enough' method? where you give a list of points, it
192determines which triangles the points are in and returns the time
193series vert info for those points.       
Note: See TracBrowser for help on using the repository browser.