source: inundation/wiki/least_squares_redesign.txt @ 2022

Last change on this file since 2022 was 1948, checked in by duncan, 19 years ago

feedback from Ole

File size: 5.7 KB
Line 
1Project Plan
2
3By Duncan Gray
4
5INTRODUCTION
6
7 Software, called least squares, has been developed to implement a
8penalised least-squares fit and associated interpolations. It is
9currently in least_squares.py.  Steven Roberts outlined the least
10squares algorithm used. The code is currently not meeting user
11requirements with regards to speed and memory use.  Additional
12functionality is also needed.
13
14
15SCOPE
16
17This is looking at redesigning the least_squares module, excluding
18interpolate_function and pts2rectangle.  (Other modules will
19be modified, based on these changes.)
20
21
22LIFE-CYCLE
23
24Use a staged delivery life-cycle (see rapid development), with the
25proviso that additional functionality to supporting objects, such as
26build_quadtree in quad.py can be worked on after requirements have
27been determined.  Also, the current least squares may be tinkered with
28to check the feasibility of concepts and general maintanance.  The
29main thrust will be refactoring though.
30
31
32______________________________________________________________________
33Least Squares Refactoring Software Requirements Specification
34
35By Duncan Gray
36
37
38INTRODUCTION
39
40This document specifies the requirements of the redesign of least squares.
41
42DEFINITIONS
43
44 
45REQUIREMENT SECTION
46
47Introduction
48
49Assume that all the requirements of the current least squares code will
50stay the same, unless stated.  Note implementation and names used are
51not regarded as requirements. 
52
53
54Requirements - Fitting
55
56When in verbose mode display how many points are in the
57mesh and how many are outside its boundary. Also display the max # of
58points/ triangle and the minimum number of points.  If this operation
59takes a lot of time, let it be optional.
60
61Raise an exception if no points are inside the mesh.
62
63Fitting 5 million points to a one million triangles mesh will not cause a
64memory error.
65
66Remove least_squares commandline interface.
67
68Points do not have to be supplied all at once.  They can be supplied
69as blocks. (This will improve memory use)
70
71Requirements - Interpolation
72
73Return information on which points were outside the mesh (in such a
74way that it can not be interpreted as real data).  If attribute
75information for points outside the mesh is returned, that can only be
76a float, let the user set the default value that is returned for
77points outside the mesh. if the value can be set to NAN, then set it
78to NAN, if no default is given.
79
80Reduce the time spent building matrix A. (not a testable requirement)
81
82Points do not have to be supplied all at once.  They can be supplied
83as blocks. (This will improve memory use)
84
85A target to aim for
86Algorithmic complexity of build equations should be linear in number
87of data points and logorithmic in the number of vertices.
88
89Constraint
90Memory consumption of least squares equations must be indepenent on
91the number of datapoints. (Except for the memory used to store the points.)
92___________________________________________________________________________
93Least Squares System Design Specification
94
95By Duncan Gray - lots of good ideas by Ole
96
97
98INTRODUCTION
99
100This specifies design choices for the least squares code.
101
102DESIGN IDEAS, to meet requirements
103*To save memory;
104When interpolating, don't build AtA, B and D. (This has been implemented)
105When fitting, don't build A. - Work with AtA and Atz. Build Atz
106directly, rather than through At*z - use code similar to AtA.
107
108
109* To save memory
110
111This is for the layer interpolation and fitting with IO: Use blocking
112when interpolating/fitting a large number of points (how large?  Can the
113program check the available memory and work out a good block size?
114Maybe have it user specified.  Do have the option for user specified)
115
116The code has to be refactored so fitting can handle blocked point
117info.  Need a building phase and a solve phase. 
118
119
120*To reduce the time spent building matrix A.
121
122Change the expansion algorithm, when determining what triangle a point
123 is in, when none of the vertices are in the cell the point is in.
124
125Options (still thinking about this one)
126*  It should look in the cell of the point and then all adjacent
127cells. If it isn't there then ... expand more.
128- Building the structure to keep this info could be prohibitive.
129
130*A feature that could work for the current implementation is to keep a
131 list of triangles that have been checked, so each incorrect triangle
132 isn't checked three times, if all three verts are in the cell being checked.
133
134Another speed up option is to write the bottlenecks in C. Do it if necessary.
135
136* To give warnings.
137Use warnings module?  Check this out.
138
139DESIGN IDEAS, refactoring
140
141Remove georeferencing from build_interpolation_matrix_A. Change the
142point co-ords to conform to the mesh co-ords early in the code.
143
144Currently there is one class, called Interpolation, that holds the
145matrix data and methods for fitting and interpolation.  Break this
146into two classes, one for fittting, the other for interpolation. Have
147processes common to both classes in functions.     
148
149Have least squares as two stand alone packages.  One that has the guts
150of LS (e.g. the two new classes).  One that has the IO API's
151(functions that load and write files and do the blocking).  The guts
152package will be dependent of the general mesh package.  The IO API
153will rely on an IO module.
154
155HOW WILL IMPLEMENTATION OCCUR?
156Code a new least squares module in a seperate directory.  Heavily cut
157and paste from the old least squares. Implement memory and time tests
158on the old least squares (and new LS) before doing any changes. 
159
160For the speed up, discuss/ get go ahead from Ole before implementing
161any major changes, since there is no obvious way forward.
Note: See TracBrowser for help on using the repository browser.