Skip to content

Instantly share code, notes, and snippets.

@GnsP
Last active August 29, 2015 14:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GnsP/72dce0ac96394f54797c to your computer and use it in GitHub Desktop.
Save GnsP/72dce0ac96394f54797c to your computer and use it in GitHub Desktop.
GSoC'15 Proposal for Boost.uBLAS Algorithms

##GSoC'15 Proposal for Boost.uBLAS Algorithms

###Personal Details

  • Name : Ganesh Prasad Sahoo
  • University : National Institute of Technology, Rourkela, India
  • Course/Major : Computer Science and Engineering
  • Degree Program : B.Tech.
  • Email : sir.gnsp@gmail.com

###Availability I'd like to spend 40 hours/week on GSoC from first of May to last of July and 20 hours/week during August. I would like to start as soon as possible and work through the summer. Intended start and end dates are discussed in detail in the timeline section. I may be unavailable during 10th - 20th of June due to the possibility that I may have to go home and spend more time with the family.

###Background Information Please summarize your educational background (degrees earned, courses taken, etc.).
Please summarize your programming background (OSS projects, internships, jobs, etc.).

I am currently a student of Computer Science and Engineering at National Institute of Technology, Rourkela, India. Here is a list of relevant courses that I have taken.

  • Mathematics - I (ODEs and Laplace) during Autumn 2012
  • Mathematics - II (Linear Algebra and Vector Calculus) during Spring 2013
  • Data Structures and Algorithms during Spring 2013
  • Mathematics - III (Numerical Analysis) during Autumn 2013
  • Numerical Methods Lab during Autumn 2013
  • Elementary Number Theory during Autumn 2013
  • Discrete Mathematics during Autumn 2013
  • Principle of Programming Languages during Spring 2014
  • Algorithm Analysis and Design during Spring 2015
  • Software Analysis and Design during Spring 2015

For my interest in Library development in C++ I am indebted to three specific books :

  1. Large Scale C++ Software Design by John Lakos
  2. Effective STL by Scott Meyers
  3. Modern C++ Design by Andrei Alexandrescu

For programming background, please visit my github profile.

Please tell us a little about your programming interests.

My field of interest is Language Design and Compiler Construction. I have interest in library development and template metaprogramming too. C++ is the principal language that I use for my projects. I also have a few years of good experience with Python.

Please tell us why you are interested in contributing to the Boost C++ Libraries.

I use Boost C++ Libraries, I am acquainted with its source code and the library has been a constant source of inspiration for me. Also, as I am interested in library development in C++, Boost C++ Libraries is one of the best open source projects to contribute to.

What is your interest in the project you are proposing?

I have studied Linear Algebra and C++. I am confident that I can do something useful with my studies of Linear Algebra and C++ by contributing Decomposition and Solver algorithms to Boost.uBLAS . It's also a great chance to learn new things and techniques about C++.

Have you done any previous work in this area before or on similar projects?

Yes, I have implemented decomposition and solver algorithms in C++ as academic assignments for the Numerical Methods Lab during Autumn 2013. However none of those implementations are in any way close to the Boost standards, but I think they do contribute to my ability of understanding and implementing the algorithms. The most relevant and similar work would be the Gaussian Elimination Algorithms I've implemented for the competency test for this project. I have tried to make my implementation of the algorithm as close to Boost.uBLAS standards as it can be.

Also, I have gone through the source code of libraries like eigen and Viennacl.

What are your plans beyond this Summer of Code time frame for your proposed work?.

There are a lot of algorithms that are yet to be implemented on uBLAS and we can implement and test probably only 3 sets of algorithms during the 3 months of GSoC'15 . After GSoC time frame I would concentrate more on the maintenance of the code and integration of user feedback. If I get more time free from my academic obligations I may try and implement a few more algorithms for uBLAS.

Please rate, from 0 to 5 (0 being no experience, 5 being expert), your knowledge of the following languages, technologies, or tools:

  • C++ 98/03: 3, I am more acquainted with the 03 standard. I have practically no experience with older compilers and the 98 standard.

  • C++ 11/14: 4, I use it and I understand the recent features and compiler internals.

  • C++ Standard Library: 4, I am quite experienced with the STL. But it's so vast that I can not be too confident about my expertise.

  • Boost C++ Libraries: 2.5, I believe that there are very few people who have mastered everything in Boost. I have learned/used only the following libraries :

    >Accumulators, Algorithm, Any, Array, Bimap, Concepts, Context, Foreach, Fusion, Graph,  
    >MPL, Phoenix, Property Tree, Range, Spirit, Static Assert, uBLAS  
    
  • Git: 3, I use it and I have been using it for quite a long time, yet doing some advanced stuff requires reading the documentation.

What software development environments are you most familiar with (Visual Studio, Eclipse, KDevelop, etc.)?

I am more of an old school guy who finds the commandline more productive. I normally test my codes on g++-4.6.4 and g++-4.9.2. I am acquainted with gdb and gprof. My editor of choice is Vim. I have tried Eclipse and Code::Blocks before, but I have very less experience with Visual Studio.

What software documentation tool are you most familiar with (Doxygen, DocBook, Quickbook, etc.)?

I use Doxygen. I have read the documentation of Quickbook, but I have no practical experience with it.

###Project Proposal

####What I am Proposing to do

Boost.uBLAS is one of the most used linear algebra libraries written in C++ which provides performance comparable to libraries written in C and Fortran while preserving the type flexibility of C++. Though uBLAS provides pretty much of everything that is needed to write programs that use linear algebra, it lacks the decomposition algorithms except for LU factorization.

I propose to implement, test and integrate in Boost.uBLAS the following algorithms :

  1. Householder Reflection
  2. QR factorization
  3. Eigen factorization and diagonalization

The implementations will include overloaded functions for every foreseeable and possible cases of computation and solver functions. My main goal would be performance optimization and numerical stability of the implementations. The reason for proposing these three algorithms is that they are interrelated and implementing them as per the sequence described in the timeline section would make the implementation faster and the whole development process effectively efficient.

Also, as I have implemented the gaussian elimination and its solvers for the competency test and the implementation is in a rather good shape, I intend to perfect it for integration with Boost.uBLAS during the community bonding period of GSoC'15 and try to get it accepted into the library. Though Gaussian Elimination is rather trivial and comparatively slow, it's good to have it in uBLAS.

####How would I do it

The Boost idea page clearly states that

For the solvers project:

  • study the main solving and decomposition algorithms,
  • study how to integrate an algorithm in uBLAS
  • implement the algorithm
  • write extensive test for not only testing the correctness of the algorithms but also the speed and the numerical stability. The last point is presumably one of the most important

I intend to concentrate fully on the three listed algorithms for my GSoC project and test my implementations as extensively as I can. As for the competency test I have tested my implementation against scipy.linalg and I intend to do the same for the other algorithms too. However, I am new to the Boost.test library and want to gain some valuable practical experience on it.

The above mentioned algorithms are standard algorithms for any linear algebra library, therefore I will briefly outline my plans for GSoC'15 rather than going into details of implementation and testing strategies.

The previous section makes it clear that I intend to concentrate on the QR algorithms during this summer. There are two dominant ways of QR factorization of matrices, the first one involving Householder Reflection and the second one involving Gram-Schmidth algorithm. The Householder Reflection is one of the most used matrix algorithms. Hence, I intend to implement the Householder Reflection algorithm first and keep it as an individual module. Then I intend to implement the Gram-Schmidt algorithm in a similar manner. Now, after having implemented and tested the two above mentioned algorithms, I intend to implement the QR factorization using both. The implementation of the factorization will provide the user with choices to use Householder or Gram-Schmidt. If possible within the time frame, I will try to resolve the faster of the two algorithms for factorization during compile time for fixed sized matrices and special matrices (I admit that this part of the proposal is not a clear formed idea right now, but it will be clearly visualizable during the implementation and testing, therefore I keep it as an additional optimization goal and I'll try to achieve this goal only if the time line of the project permits).

The next goal is to implement the Eigen factorization. There are multiple algorithms to find Eigen values and Eigen vectors of a given matrix and each of them has their pros and cons. But as before arriving at this phase of development I would have implemented Householder Reflection and the QR factoriztaion, I plan to implement the Eigen solver using the QR algorithm.

#####Testing

Testing of the algorithms will be done in an hierarchical manner. At the base level unit testing goes parallel with the implementation. Then the individual functions and modules are to be tested for proper integration with existing code, confirmation with the standards and cases of singularities. I intend to test the stability of the algorithms by testing them against existing implementations. As I have mentioned earlier, I would like to test my implementations against scipy.linalg for stability. As for the performance comparison with other libraries and all other smaller details of the project, those will be decided after discussing with the mentor, other contributers and testers.

###Timeline and Milestones

I propose the following development cycle:

  • study the algorithm and current works in the field in details
  • figure out optimization possibilities and experiment with them
  • create an initial implementation for Boost.uBLAS and do unit tests
  • refine until tests pass and performance criteria are met
  • write the documentations and move to the next algorithm

The following is the proposed timeline of the project

  • Present to 25 May :
    • Study more about recent developments regarding the proposed algorithms and work more on the Gaussian Elimination.
    • Figure out detailed implementation strategies, optimization opportunities, write implementations in smaller scale and experiment with proposed algorithms
    • Keep everything ready to start coding

#####Householder Reflection

  • 25 May : Finalize the design decisions for Householder Reflection algorithm and start coding
  • 2 June : Deadline for completion of initial implementation of Householder Reflection algorithms.
  • 3 June to 10 June : Testing, modifications, performance analysis, numerical stability analysis of the implementation.

#####QR Factorization

  • 11 June : Finalize the design decisions for Gram-Schmidt algorithm and start implementing
  • 18 June : Deadline for completion of initial implementation of Gram-Schmidt algorithm
  • 19 June - 26 June : Testing, modifications, performance analysis, numerical stability analysis of the implementation.
  • 3 July : Deadline for completion of initial implementation of QR factorization algorithms (both using Householder and Gram-Schmidt)
  • 4 July - 11 July : Testing, modifications, performance analysis, numerical stability analysis of the implementation.
  • 11 July : Submit a review request for work done so far
  • 12 July - 19 July : Modifications and enhancements based on the reviews

#####Eigen Factorization and Diagonalization

  • 20 July - 22 July : Finalize the design decisions for the Eigen Factorization Algorithms
  • 1 August : Deadline for initial implementation of Eigen Factorization
  • 2 August - 10 August : Testing, modifications, performance analysis, numerical stability analysis of the implementation.
  • 11 August : Submit a review request for work done so far
  • 12 August - 20 August : Modifications and enhancements based on the reviews, writing documentation
  • 21 August - 24 August : Check and modify documentations if needed

###Competency Test

I would like to prove my competency by submission of the Gaussian Elimination test described on the Boost idea page as well as with projects done previously.

Gaussian Elimination : I have tried to make the code as close to Boost.uBLAS standards as I could. It also includes solvers and overloads of the algorithm for different cases. It has been tested against scipy.linalg and the test results clearly demonstrate the performance and numerical stability of different versions of the algorithm. (like, the Gaussian Elimination without any pivoting is highly unstable that passes only some trivial cases, where as the partial pivoting is relatively slow but passes all 600 test matrices generated)

ER-Markup Translator to Graphviz dot : This one is to demonstrate my enthusiasm about programming and my experience with C++. The ER Markup language is a weekend invention of mine. The translator to Graphviz dot syntax is about 800 lines of C++ code. It can be said to be a small handwritten recursive descent parser with a code generator.

Project I : This project demonstrates my ability to handle more than 1000 lines of code written in C++ (amounting about 4000 lines of code without comments). Project I is an irregularly run experiment by me about how a highly expressive programming language can be developed. It's currently in an experimental stage where I implement various aspects of the language I imagine and try to test and assert their feasibility. (Most of the code may seem rather unstructured due to lack of a readme and comments, but they perfectly make sense and work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment