source: inundation/wiki/least_squares_redesign.txt @ 2067

Last change on this file since 2067 was 2067, checked in by duncan, 18 years ago

added code example

File size: 6.5 KB
Line 
1Project Plan
2
3By Duncan Gray
4
5INTRODUCTION
6
7 Software, called least squares, has been developed to implement a
8penalised least-squares fit and associated interpolations. It is
9currently in least_squares.py.  Steven Roberts outlined the least
10squares algorithm used. The code is currently not meeting user
11requirements with regards to speed and memory use.  Additional
12functionality is also needed.
13
14
15SCOPE
16
17This is looking at redesigning the least_squares module, excluding
18interpolate_function and pts2rectangle.  (Other modules will
19be modified, based on these changes.)
20
21
22LIFE-CYCLE
23
24Use a staged delivery life-cycle (see rapid development), with the
25proviso that additional functionality to supporting objects, such as
26build_quadtree in quad.py can be worked on after requirements have
27been determined.  Also, the current least squares may be tinkered with
28to check the feasibility of concepts and general maintanance.  The
29main thrust will be refactoring though.
30
31
32______________________________________________________________________
33Least Squares Refactoring Software Requirements Specification
34
35By Duncan Gray
36
37
38INTRODUCTION
39
40This document specifies the requirements of the redesign of least squares.
41
42DEFINITIONS
43
44 
45REQUIREMENT SECTION
46
47Introduction
48
49Assume that all the requirements of the current least squares code will
50stay the same, unless stated.  Note implementation and names used are
51not regarded as requirements. 
52
53
54Requirements - Fitting
55
56When in verbose mode display how many points are in the
57mesh and how many are outside its boundary. Also display the max # of
58points/ triangle and the minimum number of points.  If this operation
59takes a lot of time, let it be optional.
60
61Raise an exception if no points are inside the mesh.
62
63Fitting 5 million points to a one million triangles mesh will not cause a
64memory error.
65
66Remove least_squares commandline.  This means it will be called using
67python files, which should increase accountability.
68
69Points do not have to be supplied all at once.  They can be supplied
70as blocks. (This will improve memory use)
71
72Requirements - Interpolation
73
74Return information on which points were outside the mesh (in such a
75way that it can not be interpreted as real data).  If attribute
76information for points outside the mesh is returned, that can only be
77a float, let the user set the default value that is returned for
78points outside the mesh. if the value can be set to NAN, then set it
79to NAN, if no default is given.
80
81
82Reduce the time spent building matrix A. (not a testable requirement)
83
84Points do not have to be supplied all at once.  They can be supplied
85as blocks. (This will improve memory use)
86
87A target to aim for
88Algorithmic complexity of build equations should be linear in number
89of data points and logorithmic in the number of vertices.
90
91Constraint
92Memory consumption of least squares equations must be indepenent on
93the number of datapoints. (Except for the memory used to store the points.)
94___________________________________________________________________________
95Least Squares System Design Specification
96
97By Duncan Gray - lots of good ideas by Ole
98
99
100INTRODUCTION
101
102This specifies design choices for the least squares code.
103
104DESIGN IDEAS, to meet requirements
105*To save memory;
106When interpolating, don't build AtA, B and D. (This has been implemented)
107When fitting, don't build A. - Work with AtA and Atz. Build Atz
108directly, rather than through At*z - use code similar to AtA.
109
110
111* To save memory
112
113This is for the layer interpolation and fitting with IO: Use blocking
114when interpolating/fitting a large number of points (how large?  Can the
115program check the available memory and work out a good block size?
116Maybe have it user specified.  Do have the option for user specified)
117
118The code has to be refactored so fitting can handle blocked point
119info.  Need a building phase and a solve phase. 
120
121*Returning information on points outside the mesh.
122Do not find that these points are outside the mesh by searching the
123binary tree.  Do it by using polygon.  (data_manager.sww2dem does not
124use the expanded search, since it needs points outside the mesh, and
125setting precrop to True means these points are not returned.  Using a
126precrop of true and expand search true grinds the system to a halt.)
127___________________
128Here’s the snippet (from data_manager.py) that will identify points outside the mesh, in case you can use it for the interpolate methodology:
129 
130    P = interp.mesh.get_boundary_polygon()
131    outside_indices = outside_polygon(grid_points, P, closed=True)
132    for i in outside_indices:
133        grid_values[i] = NODATA_value       
134Cheers, O
135__________________
136
137*To reduce the time spent building matrix A.
138
139Change the expansion algorithm, when determining what triangle a point
140 is in, when none of the vertices are in the cell the point is in.
141
142Options (still thinking about this one)
143*  It should look in the cell of the point and then all adjacent
144cells. If it isn't there then ... expand more.
145- Building the structure to keep this info could be prohibitive.
146
147*A feature that could work for the current implementation is to keep a
148 list of triangles that have been checked, so each incorrect triangle
149 isn't checked three times, if all three verts are in the cell being checked.
150
151Another speed up option is to write the bottlenecks in C. Do it if necessary.
152
153* To give warnings.
154Use warnings module?  Check this out.
155
156DESIGN IDEAS, refactoring
157
158Remove georeferencing from build_interpolation_matrix_A. Change the
159point co-ords to conform to the mesh co-ords early in the code.
160
161Currently there is one class, called Interpolation, that holds the
162matrix data and methods for fitting and interpolation.  Break this
163into two classes, one for fittting, the other for interpolation. Have
164processes common to both classes in functions.     
165
166Have least squares as two stand alone packages.  One that has the guts
167of LS (e.g. the two new classes).  One that has the IO API's
168(functions that load and write files and do the blocking).  The guts
169package will be dependent of the general mesh package.  The IO API
170will rely on an IO module.
171
172HOW WILL IMPLEMENTATION OCCUR?
173Code a new least squares module in a seperate directory.  Heavily cut
174and paste from the old least squares. Implement memory and time tests
175on the old least squares (and new LS) before doing any changes. 
176
177For the speed up, discuss/ get go ahead from Ole before implementing
178any major changes, since there is no obvious way forward.
Note: See TracBrowser for help on using the repository browser.