I worked with BoostC++ Libraries to implement data_frame (similar to R-data.frame / Python Pandas Library) in C++ using Boost.Variant.
data_frame is a matrix-like structure in which the columns can have data of different data types. I first implemented df_column class which represents a column in the data_frame, using variant of some fixed data_types. Then I implemented the data_frame structure using df_column. It took me almost 2 months to implement these classes. In the last month of the coding period, I wrote tests using the Boost.unit_test module, documentation using Doxygen and extended the functionalities of the data_frame class.
Links:
- Binary Operations (+/-/*)
- Access and Modification Operations
- Generating Statistical Summaries
- Column subsetting on the basis of slice, range or indirect_array of column indices
My GSoC experience was amazing. I got to learn a lot of new things; thanks to my mentor and the Boost Community. It felt great to work on such an interesting project and to implement the functionality from scratch.
My mentor, David Bellot gave valuable inputs during the project work, whenever I was stuck at something. My interactions with him were very productive and important for the success and completion of the project. I thank him for all his help.
The biggest challenge of the project was to store data of different data-types together in a data_frame given the static-typed nature of C++. I experimented with Boost.Variant, Variadic Templates and storing data as strings to solve this problem. Finally, I ended up settling on the Boost.Variant version as the most optimal solution.
The github link contains the directory of the source code of the project, along with tests, documentation and proceedure to use the library and run tests.