source: inundation/wiki/least_squares_redesign.txt @ 1947

Last change on this file since 1947 was 1947, checked in by duncan, 18 years ago

more changes

File size: 6.3 KB
Line 
1Project Plan
2
3By Duncan Gray
4
5INTRODUCTION
6
7 Software, called least squares, has been developed to implement a
8penalised least-squares fit and associated interpolations. It is
9currently in least_squares.py.  Steven Roberts outlined the least
10squares algorithm used. The code is currently not meeting user
11requirements with regards to speed and memory use.  Additional
12functionality is also needed.
13
14
15SCOPE
16
17This is looking at redesigning the least_squares module, excluding
18interpolate_function and pts2rectangle.  (Other modules will
19be modified, based on these changes.)
20
21
22LIFE-CYCLE
23
24Use a staged delivery life-cycle (see rapid development), with the
25proviso that additional functionality to supporting objects, such as
26build_quadtree in quad.py can be worked on after requirements have
27been determined.  Also, the current least squares may be tinkered with
28to check the feasibility of concepts and general maintanance.  The
29main thrust will be refactoring though.
30
31
32______________________________________________________________________
33Least Squares Refactoring Software Requirements Specification
34
35By Duncan Gray
36
37
38INTRODUCTION
39
40This document specifies the requirements of the refactoring of least squares.
41
42DEFINITIONS
43
44 
45REQUIREMENT SECTION
46
47Introduction
48
49Assume that all the requirements of the current least squares code will
50stay the same, unless stated.  Note implementation and names used are
51not regarded as requirements. 
52
53
54Requirements - Fitting
55
56When in verbose mode display how many points are in the
57mesh and how many are outside its boundary. Also display the max # of
58points/ triangle and the minimum number of points.  If this operation
59takes a lot of time, let it be optional.
60
61Raise an exception if no points are inside the mesh.
62
63Fitting 5 million points to a one million triangles mesh will not cause a
64memory error.
65
66Remove least_squares commandline interface.
67
68Points do not have to be supplied all at once.  They can be supplied
69as blocks. (This will improve memory use)
70
71Requirements - Interpolation
72
73Return information on which points were outside the mesh (in such a
74way that it can not be interpreted as real data).  If attribute
75information for points outside the mesh is returned, that can only be
76a float, let the user set the default value that is returned for
77points outside the mesh.
78
79Reduce the time spent building matrix A. (not a testable requirement)
80
81
82Points do not have to be supplied all at once.  They can be supplied
83as blocks. (This will improve memory use)
84
85A target to aim for
86Algorithmic complexity of build equations should be linear in number
87of data points and logorithmic in the number of vertices.
88
89Constraint
90Memory consumption of least squares equations must be indepenent on
91the number of datapoints. (Except for the memory used to store the points.)
92
93How should interpolate_sww respond to a ? 
94Warning and replace with given value.
95
96General Error Handling Guidelines
97
98Assume 4 ways of using the code (for Inperpolate class)
991) command line interface
1002) API function - these have parameters of input  data file names,
101output file names, and carry out a process.
1023) External methods - These are the calls that the api function uses
1034) Internal methods.
104
105Way 1 is used by general users, so they should not return
106something that looks like a code crash from bad input.
107An option for way 2 is to raise an exception, which the user
108handles. Should the code display a warning, based on the error?
109
110Note, if a method is used by a gui, or by scripts change the requirements 
111Discus with Error Handling Ole
112
113
114___________________________________________________________________________
115Least Squares System Design Specification
116
117By Duncan Gray - lots of good ideas by Ole
118
119
120INTRODUCTION
121
122This specifies design choices for the least squares code.
123
124DESIGN IDEAS, to meet requirements
125*To save memory;
126When interpolating, don't build AtA, B and D. (This has been implemented)
127When fitting, don't build A. - Work with AtA and Atz. Build Atz
128directly, rather than through At*z - use code similar to AtA
129
130
131
132
133* To save memory
134
135This is for the layer interpolation and fitting with IO: Use blocking
136when interpolating/fitting a large number of points (how large?  Can the
137program check the available memory and work out a good block size?
138Maybe have it user specified.  Do have the option for user specified)
139
140The code has to be refactored so fitting can handle blocked point
141info.  Need a building phase and a solve phase. 
142
143
144*To reduce the time spent building matrix A.
145
146Change the expansion algorithm, when determining what triangle a point
147 is in, when none of the vertices are in the cell the point is in.
148
149Options (still thinking about this one)
150*  It should look in the cell of the point and then all adjacent
151cells. If it isn't there then ... expand more.
152- Building the structure to keep this info could be prohibitive.
153
154*A feature that could work for the current implementation is to keep a
155 list of triangles that have been checked, so each incorrect triangle
156 isn't checked three times, if all three verts are in the cell being checked.
157
158Another speed up option is to write the bottlenecks in C. Do it if necessary.
159
160* To give warnings.
161Use warnings module?  Check this out.
162
163DESIGN IDEAS, refactoring
164
165Remove georeferencing from build_interpolation_matrix_A. Change the
166point co-ords to conform to the mesh co-ords early in the code.
167
168Currently there is one class, called Interpolation, that holds the
169matrix data and methods for fitting and interpolation.  Break this
170into two classes, one for fittting, the other for interpolation. Have
171processes common to both classes in functions.     
172
173Have least squares as two stand alone packages.  One that has the guts
174of LS (e.g. the two new classes).  One that has the IO API's
175(functions that load and write files and do the blocking).  The guts
176package will be dependent of the general mesh package.  The IO API
177will rely on an IO module.
178
179HOW WILL IMPLEMENTATION OCCUR?
180Code a new least squares module in a seperate directory.  Heavily cut
181and paste from the old least squares. Implement memory and time tests
182on the old least squares (and new LS) before doing any changes. 
183
184For the speed up, discuss/ get go ahead from Ole before implementing
185any major changes, since there is no obvious way forward.
Note: See TracBrowser for help on using the repository browser.