1 | Project Plan |
---|
2 | |
---|
3 | By Duncan Gray |
---|
4 | |
---|
5 | INTRODUCTION |
---|
6 | |
---|
7 | Software, called least squares, has been developed to implement a |
---|
8 | penalised least-squares fit and associated interpolations. It is |
---|
9 | currently in least_squares.py. Steven Roberts outlined the least |
---|
10 | squares algorithm used. The code is currently not meeting user |
---|
11 | requirements with regards to speed and memory use. Additional |
---|
12 | functionality is also needed. |
---|
13 | |
---|
14 | |
---|
15 | SCOPE |
---|
16 | |
---|
17 | This is looking at redesigning the least_squares module, excluding |
---|
18 | interpolate_function and pts2rectangle. (Other modules will |
---|
19 | be modified, based on these changes.) |
---|
20 | |
---|
21 | |
---|
22 | LIFE-CYCLE |
---|
23 | |
---|
24 | Use a staged delivery life-cycle (see rapid development), with the |
---|
25 | proviso that additional functionality to supporting objects, such as |
---|
26 | build_quadtree in quad.py can be worked on after requirements have |
---|
27 | been determined. Also, the current least squares may be tinkered with |
---|
28 | to check the feasibility of concepts and general maintanance. The |
---|
29 | main thrust will be refactoring though. |
---|
30 | |
---|
31 | |
---|
32 | ______________________________________________________________________ |
---|
33 | Least Squares Refactoring Software Requirements Specification |
---|
34 | |
---|
35 | By Duncan Gray |
---|
36 | |
---|
37 | |
---|
38 | INTRODUCTION |
---|
39 | |
---|
40 | This document specifies the requirements of the redesign of least squares. |
---|
41 | |
---|
42 | DEFINITIONS |
---|
43 | |
---|
44 | |
---|
45 | REQUIREMENT SECTION |
---|
46 | |
---|
47 | Introduction |
---|
48 | |
---|
49 | Assume that all the requirements of the current least squares code will |
---|
50 | stay the same, unless stated. Note implementation and names used are |
---|
51 | not regarded as requirements. |
---|
52 | |
---|
53 | |
---|
54 | Requirements - Fitting |
---|
55 | |
---|
56 | When in verbose mode display how many points are in the |
---|
57 | mesh and how many are outside its boundary. Also display the max # of |
---|
58 | points/ triangle and the minimum number of points. If this operation |
---|
59 | takes a lot of time, let it be optional. |
---|
60 | |
---|
61 | Raise an exception if no points are inside the mesh. |
---|
62 | |
---|
63 | Fitting 5 million points to a one million triangles mesh will not cause a |
---|
64 | memory error. |
---|
65 | |
---|
66 | Remove least_squares commandline. This means it will be called using |
---|
67 | python files, which should increase accountability. |
---|
68 | |
---|
69 | Points do not have to be supplied all at once. They can be supplied |
---|
70 | as blocks. (This will improve memory use) |
---|
71 | |
---|
72 | Requirements - Interpolation |
---|
73 | |
---|
74 | Return information on which points were outside the mesh (in such a |
---|
75 | way that it can not be interpreted as real data). If attribute |
---|
76 | information for points outside the mesh is returned, that can only be |
---|
77 | a float, let the user set the default value that is returned for |
---|
78 | points outside the mesh. if the value can be set to NAN, then set it |
---|
79 | to NAN, if no default is given. |
---|
80 | |
---|
81 | |
---|
82 | Reduce the time spent building matrix A. (not a testable requirement) |
---|
83 | |
---|
84 | Points do not have to be supplied all at once. They can be supplied |
---|
85 | as blocks. (This will improve memory use) |
---|
86 | |
---|
87 | A target to aim for |
---|
88 | Algorithmic complexity of build equations should be linear in number |
---|
89 | of data points and logorithmic in the number of vertices. |
---|
90 | |
---|
91 | Constraint |
---|
92 | Memory consumption of least squares equations must be indepenent on |
---|
93 | the number of datapoints. (Except for the memory used to store the points.) |
---|
94 | ___________________________________________________________________________ |
---|
95 | Least Squares System Design Specification |
---|
96 | |
---|
97 | By Duncan Gray - lots of good ideas by Ole |
---|
98 | |
---|
99 | |
---|
100 | INTRODUCTION |
---|
101 | |
---|
102 | This specifies design choices for the least squares code. |
---|
103 | |
---|
104 | DESIGN IDEAS, to meet requirements |
---|
105 | *To save memory; |
---|
106 | When interpolating, don't build AtA, B and D. (This has been implemented) |
---|
107 | When fitting, don't build A. - Work with AtA and Atz. Build Atz |
---|
108 | directly, rather than through At*z - use code similar to AtA. |
---|
109 | |
---|
110 | |
---|
111 | * To save memory |
---|
112 | |
---|
113 | This is for the layer interpolation and fitting with IO: Use blocking |
---|
114 | when interpolating/fitting a large number of points (how large? Can the |
---|
115 | program check the available memory and work out a good block size? |
---|
116 | Maybe have it user specified. Do have the option for user specified) |
---|
117 | |
---|
118 | The code has to be refactored so fitting can handle blocked point |
---|
119 | info. Need a building phase and a solve phase. |
---|
120 | |
---|
121 | *Returning information on points outside the mesh. |
---|
122 | Do not find that these points are outside the mesh by searching the |
---|
123 | binary tree. Do it by using polygon. (data_manager.sww2dem does not |
---|
124 | use the expanded search, since it needs points outside the mesh, and |
---|
125 | setting precrop to True means these points are not returned. Using a |
---|
126 | precrop of true and expand search true grinds the system to a halt.) |
---|
127 | ___________________ |
---|
128 | Heres the snippet (from data_manager.py) that will identify points outside the mesh, in case you can use it for the interpolate methodology: |
---|
129 | |
---|
130 | P = interp.mesh.get_boundary_polygon() |
---|
131 | outside_indices = outside_polygon(grid_points, P, closed=True) |
---|
132 | for i in outside_indices: |
---|
133 | grid_values[i] = NODATA_value |
---|
134 | Cheers, O |
---|
135 | __________________ |
---|
136 | |
---|
137 | *To reduce the time spent building matrix A. |
---|
138 | |
---|
139 | Change the expansion algorithm, when determining what triangle a point |
---|
140 | is in, when none of the vertices are in the cell the point is in. |
---|
141 | |
---|
142 | Options (still thinking about this one) |
---|
143 | * It should look in the cell of the point and then all adjacent |
---|
144 | cells. If it isn't there then ... expand more. |
---|
145 | - Building the structure to keep this info could be prohibitive. |
---|
146 | |
---|
147 | *A feature that could work for the current implementation is to keep a |
---|
148 | list of triangles that have been checked, so each incorrect triangle |
---|
149 | isn't checked three times, if all three verts are in the cell being checked. |
---|
150 | |
---|
151 | Another speed up option is to write the bottlenecks in C. Do it if necessary. |
---|
152 | |
---|
153 | * To give warnings. |
---|
154 | Use warnings module? Check this out. |
---|
155 | |
---|
156 | DESIGN IDEAS, refactoring |
---|
157 | |
---|
158 | Remove georeferencing from build_interpolation_matrix_A. Change the |
---|
159 | point co-ords to conform to the mesh co-ords early in the code. |
---|
160 | |
---|
161 | Currently there is one class, called Interpolation, that holds the |
---|
162 | matrix data and methods for fitting and interpolation. Break this |
---|
163 | into two classes, one for fittting, the other for interpolation. Have |
---|
164 | processes common to both classes in functions. |
---|
165 | |
---|
166 | Have least squares as two stand alone packages. One that has the guts |
---|
167 | of LS (e.g. the two new classes). One that has the IO API's |
---|
168 | (functions that load and write files and do the blocking). The guts |
---|
169 | package will be dependent of the general mesh package. The IO API |
---|
170 | will rely on an IO module. |
---|
171 | |
---|
172 | HOW WILL IMPLEMENTATION OCCUR? |
---|
173 | Code a new least squares module in a seperate directory. Heavily cut |
---|
174 | and paste from the old least squares. Implement memory and time tests |
---|
175 | on the old least squares (and new LS) before doing any changes. |
---|
176 | |
---|
177 | For the speed up, discuss/ get go ahead from Ole before implementing |
---|
178 | any major changes, since there is no obvious way forward. |
---|