Project Plan By Duncan Gray INTRODUCTION Software, called least squares, has been developed to implement a penalised least-squares fit and associated interpolations. It is currently in least_squares.py. Steven Roberts outlined the least squares algorithm used. The code is currently not meeting user requirements with regards to speed and memory use. Additional functionality is also needed. SCOPE This is looking at redesigning the least_squares module, excluding interpolate_function and pts2rectangle. (Other modules will be modified, based on these changes.) LIFE-CYCLE Use a staged delivery life-cycle (see rapid development), with the proviso that additional functionality to supporting objects, such as build_quadtree in quad.py can be worked on after requirements have been determined. Also, the current least squares may be tinkered with to check the feasibility of concepts and general maintanance. The main thrust will be refactoring though. ______________________________________________________________________ Least Squares Refactoring Software Requirements Specification By Duncan Gray INTRODUCTION This document specifies the requirements of the redesign of least squares. DEFINITIONS REQUIREMENT SECTION Introduction Assume that all the requirements of the current least squares code will stay the same, unless stated. Note implementation and names used are not regarded as requirements. Requirements - Fitting When in verbose mode display how many points are in the mesh and how many are outside its boundary. Also display the max # of points/ triangle and the minimum number of points. If this operation takes a lot of time, let it be optional. Raise an exception if no points are inside the mesh. Fitting 5 million points to a one million triangles mesh will not cause a memory error. Remove least_squares commandline. This means it will be called using python files, which should increase accountability. Points do not have to be supplied all at once. They can be supplied as blocks. (This will improve memory use) Requirements - Interpolation Return information on which points were outside the mesh (in such a way that it can not be interpreted as real data). If attribute information for points outside the mesh is returned, that can only be a float, let the user set the default value that is returned for points outside the mesh. if the value can be set to NAN, then set it to NAN, if no default is given. Reduce the time spent building matrix A. (not a testable requirement) Points do not have to be supplied all at once. They can be supplied as blocks. (This will improve memory use) A target to aim for Algorithmic complexity of build equations should be linear in number of data points and logorithmic in the number of vertices. Constraint Memory consumption of least squares equations must be indepenent on the number of datapoints. (Except for the memory used to store the points.) ___________________________________________________________________________ Least Squares System Design Specification By Duncan Gray - lots of good ideas by Ole INTRODUCTION This specifies design choices for the least squares code. DESIGN IDEAS, to meet requirements *To save memory; When interpolating, don't build AtA, B and D. (This has been implemented) When fitting, don't build A. - Work with AtA and Atz. Build Atz directly, rather than through At*z - use code similar to AtA. * To save memory This is for the layer interpolation and fitting with IO: Use blocking when interpolating/fitting a large number of points (how large? Can the program check the available memory and work out a good block size? Maybe have it user specified. Do have the option for user specified) The code has to be refactored so fitting can handle blocked point info. Need a building phase and a solve phase. *Returning information on points outside the mesh. Do not find that these points are outside the mesh by searching the binary tree. Do it by using polygon. (data_manager.sww2dem does not use the expanded search, since it needs points outside the mesh, and setting precrop to True means these points are not returned. Using a precrop of true and expand search true grinds the system to a halt.) ___________________ Here’s the snippet (from data_manager.py) that will identify points outside the mesh, in case you can use it for the interpolate methodology: P = interp.mesh.get_boundary_polygon() outside_indices = outside_polygon(grid_points, P, closed=True) for i in outside_indices: grid_values[i] = NODATA_value Cheers, O __________________ *To reduce the time spent building matrix A. Change the expansion algorithm, when determining what triangle a point is in, when none of the vertices are in the cell the point is in. Options (still thinking about this one) * It should look in the cell of the point and then all adjacent cells. If it isn't there then ... expand more. - Building the structure to keep this info could be prohibitive. *A feature that could work for the current implementation is to keep a list of triangles that have been checked, so each incorrect triangle isn't checked three times, if all three verts are in the cell being checked. Another speed up option is to write the bottlenecks in C. Do it if necessary. * To give warnings. Use warnings module? Check this out. DESIGN IDEAS, refactoring Remove georeferencing from build_interpolation_matrix_A. Change the point co-ords to conform to the mesh co-ords early in the code. Currently there is one class, called Interpolation, that holds the matrix data and methods for fitting and interpolation. Break this into two classes, one for fittting, the other for interpolation. Have processes common to both classes in functions. Have least squares as two stand alone packages. One that has the guts of LS (e.g. the two new classes). One that has the IO API's (functions that load and write files and do the blocking). The guts package will be dependent of the general mesh package. The IO API will rely on an IO module. HOW WILL IMPLEMENTATION OCCUR? Code a new least squares module in a seperate directory. Heavily cut and paste from the old least squares. Implement memory and time tests on the old least squares (and new LS) before doing any changes. For the speed up, discuss/ get go ahead from Ole before implementing any major changes, since there is no obvious way forward.