Skip to content

Instantly share code, notes, and snippets.

@RArora28
Last active November 3, 2017 17:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RArora28/6c6712c816a2eb5a88ef4ef943efa6cb to your computer and use it in GitHub Desktop.
Save RArora28/6c6712c816a2eb5a88ef4ef943efa6cb to your computer and use it in GitHub Desktop.

I worked with BoostC++ Libraries to implement data_frame (similar to R-data.frame / Python Pandas Library) in C++ using Boost.Variant.

data_frame:

data_frame is a matrix-like structure in which the columns can have data of different data types. I first implemented df_column class which represents a column in the data_frame, using variant of some fixed data_types. Then I implemented the data_frame structure using df_column. It took me almost 2 months to implement these classes. In the last month of the coding period, I wrote tests using the Boost.unit_test module, documentation using Doxygen and extended the functionalities of the data_frame class.

Links:

Features Supported:

  • Binary Operations (+/-/*)
  • Access and Modification Operations
  • Generating Statistical Summaries
  • Column subsetting on the basis of slice, range or indirect_array of column indices

My Experience:

My GSoC experience was amazing. I got to learn a lot of new things; thanks to my mentor and the Boost Community. It felt great to work on such an interesting project and to implement the functionality from scratch.

My Mentor:

My mentor, David Bellot gave valuable inputs during the project work, whenever I was stuck at something. My interactions with him were very productive and important for the success and completion of the project. I thank him for all his help.

Challenges Faced:

The biggest challenge of the project was to store data of different data-types together in a data_frame given the static-typed nature of C++. I experimented with Boost.Variant, Variadic Templates and storing data as strings to solve this problem. Finally, I ended up settling on the Boost.Variant version as the most optimal solution.

Directory Structure

The github link contains the directory of the source code of the project, along with tests, documentation and proceedure to use the library and run tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment