[1899] | 1 | Project Plan |
---|
| 2 | |
---|
| 3 | By Duncan Gray |
---|
| 4 | |
---|
| 5 | INTRODUCTION |
---|
| 6 | |
---|
| 7 | Software, called least squares, has been developed to implement a |
---|
[1947] | 8 | penalised least-squares fit and associated interpolations. It is |
---|
| 9 | currently in least_squares.py. Steven Roberts outlined the least |
---|
| 10 | squares algorithm used. The code is currently not meeting user |
---|
| 11 | requirements with regards to speed and memory use. Additional |
---|
| 12 | functionality is also needed. |
---|
[1899] | 13 | |
---|
[1947] | 14 | |
---|
[1899] | 15 | SCOPE |
---|
| 16 | |
---|
[1947] | 17 | This is looking at redesigning the least_squares module, excluding |
---|
| 18 | interpolate_function and pts2rectangle. (Other modules will |
---|
| 19 | be modified, based on these changes.) |
---|
| 20 | |
---|
| 21 | |
---|
[1899] | 22 | LIFE-CYCLE |
---|
| 23 | |
---|
| 24 | Use a staged delivery life-cycle (see rapid development), with the |
---|
| 25 | proviso that additional functionality to supporting objects, such as |
---|
| 26 | build_quadtree in quad.py can be worked on after requirements have |
---|
| 27 | been determined. Also, the current least squares may be tinkered with |
---|
[1915] | 28 | to check the feasibility of concepts and general maintanance. The |
---|
| 29 | main thrust will be refactoring though. |
---|
[1899] | 30 | |
---|
| 31 | |
---|
| 32 | ______________________________________________________________________ |
---|
| 33 | Least Squares Refactoring Software Requirements Specification |
---|
| 34 | |
---|
| 35 | By Duncan Gray |
---|
| 36 | |
---|
| 37 | |
---|
| 38 | INTRODUCTION |
---|
| 39 | |
---|
[1948] | 40 | This document specifies the requirements of the redesign of least squares. |
---|
[1899] | 41 | |
---|
| 42 | DEFINITIONS |
---|
| 43 | |
---|
[1915] | 44 | |
---|
[1899] | 45 | REQUIREMENT SECTION |
---|
| 46 | |
---|
| 47 | Introduction |
---|
| 48 | |
---|
| 49 | Assume that all the requirements of the current least squares code will |
---|
| 50 | stay the same, unless stated. Note implementation and names used are |
---|
| 51 | not regarded as requirements. |
---|
| 52 | |
---|
| 53 | |
---|
[1947] | 54 | Requirements - Fitting |
---|
[1899] | 55 | |
---|
[1947] | 56 | When in verbose mode display how many points are in the |
---|
| 57 | mesh and how many are outside its boundary. Also display the max # of |
---|
| 58 | points/ triangle and the minimum number of points. If this operation |
---|
| 59 | takes a lot of time, let it be optional. |
---|
[1915] | 60 | |
---|
[1947] | 61 | Raise an exception if no points are inside the mesh. |
---|
[1915] | 62 | |
---|
[1918] | 63 | Fitting 5 million points to a one million triangles mesh will not cause a |
---|
[1915] | 64 | memory error. |
---|
[1899] | 65 | |
---|
[2048] | 66 | Remove least_squares commandline. This means it will be called using |
---|
| 67 | python files, which should increase accountability. |
---|
[1947] | 68 | |
---|
| 69 | Points do not have to be supplied all at once. They can be supplied |
---|
| 70 | as blocks. (This will improve memory use) |
---|
| 71 | |
---|
| 72 | Requirements - Interpolation |
---|
| 73 | |
---|
| 74 | Return information on which points were outside the mesh (in such a |
---|
| 75 | way that it can not be interpreted as real data). If attribute |
---|
| 76 | information for points outside the mesh is returned, that can only be |
---|
| 77 | a float, let the user set the default value that is returned for |
---|
[1948] | 78 | points outside the mesh. if the value can be set to NAN, then set it |
---|
[2048] | 79 | to NAN, if no default is given. |
---|
[1947] | 80 | |
---|
[2048] | 81 | |
---|
[1915] | 82 | Reduce the time spent building matrix A. (not a testable requirement) |
---|
[1899] | 83 | |
---|
[2048] | 84 | Points do not have to be supplied all at once. They can be supplied |
---|
| 85 | as blocks. (This will improve memory use) |
---|
| 86 | |
---|
[1947] | 87 | A target to aim for |
---|
| 88 | Algorithmic complexity of build equations should be linear in number |
---|
| 89 | of data points and logorithmic in the number of vertices. |
---|
[1899] | 90 | |
---|
[1947] | 91 | Constraint |
---|
| 92 | Memory consumption of least squares equations must be indepenent on |
---|
| 93 | the number of datapoints. (Except for the memory used to store the points.) |
---|
[1899] | 94 | ___________________________________________________________________________ |
---|
| 95 | Least Squares System Design Specification |
---|
| 96 | |
---|
[1915] | 97 | By Duncan Gray - lots of good ideas by Ole |
---|
[1899] | 98 | |
---|
| 99 | |
---|
| 100 | INTRODUCTION |
---|
| 101 | |
---|
| 102 | This specifies design choices for the least squares code. |
---|
| 103 | |
---|
| 104 | DESIGN IDEAS, to meet requirements |
---|
| 105 | *To save memory; |
---|
| 106 | When interpolating, don't build AtA, B and D. (This has been implemented) |
---|
[1947] | 107 | When fitting, don't build A. - Work with AtA and Atz. Build Atz |
---|
[1948] | 108 | directly, rather than through At*z - use code similar to AtA. |
---|
[1899] | 109 | |
---|
| 110 | |
---|
| 111 | * To save memory |
---|
| 112 | |
---|
[1915] | 113 | This is for the layer interpolation and fitting with IO: Use blocking |
---|
| 114 | when interpolating/fitting a large number of points (how large? Can the |
---|
[1947] | 115 | program check the available memory and work out a good block size? |
---|
| 116 | Maybe have it user specified. Do have the option for user specified) |
---|
[1915] | 117 | |
---|
[1947] | 118 | The code has to be refactored so fitting can handle blocked point |
---|
[1915] | 119 | info. Need a building phase and a solve phase. |
---|
| 120 | |
---|
[2048] | 121 | *Returning information on points outside the mesh. |
---|
| 122 | Do not find that these points are outside the mesh by searching the |
---|
| 123 | binary tree. Do it by using polygon. (data_manager.sww2dem does not |
---|
| 124 | use the expanded search, since it needs points outside the mesh, and |
---|
| 125 | setting precrop to True means these points are not returned. Using a |
---|
| 126 | precrop of true and expand search true grinds the system to a halt.) |
---|
[2067] | 127 | ___________________ |
---|
| 128 | Heres the snippet (from data_manager.py) that will identify points outside the mesh, in case you can use it for the interpolate methodology: |
---|
| 129 | |
---|
| 130 | P = interp.mesh.get_boundary_polygon() |
---|
| 131 | outside_indices = outside_polygon(grid_points, P, closed=True) |
---|
| 132 | for i in outside_indices: |
---|
| 133 | grid_values[i] = NODATA_value |
---|
| 134 | Cheers, O |
---|
| 135 | __________________ |
---|
[1915] | 136 | |
---|
[1899] | 137 | *To reduce the time spent building matrix A. |
---|
[1918] | 138 | |
---|
[1899] | 139 | Change the expansion algorithm, when determining what triangle a point |
---|
[1918] | 140 | is in, when none of the vertices are in the cell the point is in. |
---|
| 141 | |
---|
| 142 | Options (still thinking about this one) |
---|
| 143 | * It should look in the cell of the point and then all adjacent |
---|
[1947] | 144 | cells. If it isn't there then ... expand more. |
---|
| 145 | - Building the structure to keep this info could be prohibitive. |
---|
[1899] | 146 | |
---|
[1947] | 147 | *A feature that could work for the current implementation is to keep a |
---|
| 148 | list of triangles that have been checked, so each incorrect triangle |
---|
| 149 | isn't checked three times, if all three verts are in the cell being checked. |
---|
[1918] | 150 | |
---|
[1915] | 151 | Another speed up option is to write the bottlenecks in C. Do it if necessary. |
---|
[1899] | 152 | |
---|
[1947] | 153 | * To give warnings. |
---|
| 154 | Use warnings module? Check this out. |
---|
[1915] | 155 | |
---|
[1899] | 156 | DESIGN IDEAS, refactoring |
---|
| 157 | |
---|
| 158 | Remove georeferencing from build_interpolation_matrix_A. Change the |
---|
| 159 | point co-ords to conform to the mesh co-ords early in the code. |
---|
| 160 | |
---|
[1915] | 161 | Currently there is one class, called Interpolation, that holds the |
---|
| 162 | matrix data and methods for fitting and interpolation. Break this |
---|
| 163 | into two classes, one for fittting, the other for interpolation. Have |
---|
| 164 | processes common to both classes in functions. |
---|
[1899] | 165 | |
---|
[1915] | 166 | Have least squares as two stand alone packages. One that has the guts |
---|
[1947] | 167 | of LS (e.g. the two new classes). One that has the IO API's |
---|
| 168 | (functions that load and write files and do the blocking). The guts |
---|
| 169 | package will be dependent of the general mesh package. The IO API |
---|
| 170 | will rely on an IO module. |
---|
[1915] | 171 | |
---|
| 172 | HOW WILL IMPLEMENTATION OCCUR? |
---|
| 173 | Code a new least squares module in a seperate directory. Heavily cut |
---|
| 174 | and paste from the old least squares. Implement memory and time tests |
---|
[1947] | 175 | on the old least squares (and new LS) before doing any changes. |
---|
| 176 | |
---|
| 177 | For the speed up, discuss/ get go ahead from Ole before implementing |
---|
| 178 | any major changes, since there is no obvious way forward. |
---|