Skip to content

Instantly share code, notes, and snippets.

@erenon
Created March 14, 2015 17:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save erenon/70e36f6a7d7725cf8767 to your computer and use it in GitHub Desktop.
Save erenon/70e36f6a7d7725cf8767 to your computer and use it in GitHub Desktop.
Boost Enhanced Vector and Deque GSoC 2015 Proposal

GSoC 2015 Enhanced vector and deque containers Proposal

Personal Details

  • Name: Benedek Thaler
  • University: Budapest University of Technology and Economics
  • Course/Major: Computer Engineer, Applied Informatics
  • Degree Program: M.Sc.
  • Email: thalerbenedek -#- gmail.com
  • Homepage: erenon.hu
  • Availability: I'd like to spend 20 hours/week on GSoC until the end of June, 40 hours after then. I want to start as soon as possible and work through the summer. Until end of June work, after then one or two weeks of vacation affects my availability.

Background Information

Please summarize your educational background (degrees earned, courses taken, etc.). Please summarize your programming background (OSS projects, internships, jobs, etc.).

Regarding my educational and professional background, please refer to my CV online. In short, I'm a Computer Engineer M.Sc student and library developer at Morgan Stanley.

Please tell us a little about your programming interests.

In the last few years, I developed C++ libraries. Currently, the focus of my interest is the area of low latency applications. Please see my other open projects on my GitHub page

Please tell us why you are interested in contributing to the Boost C++ Libraries.

I'm using Boost regularly in my projects and it's a constant source of inspiration. I like my projects open, because I believe this is the way of how software gets done. As I can't make my professional works open because of corporate regulations, I'd like to grab the opportunity of GSoC '15 to give back and contribute to the efforts of Boost.

What is your interest in the project you are proposing?

In the world of high frequency trading, being faster means a lot. To increase throughput and reduce latency, a lot of techniques were developed lately, e.g: zero copy, various kernel bypass technologies, RDMA. However, while providing very fast reading because of the underlying contiguous memory, std::vector makes random insertion and erasing a challenge. I think the proposed data structures help this, especially the RDMA usecase.

Have you done any previous work in this area before or on similar projects?

I created several container-like objects before, however, probably nothing so refined as a STL container. Regarding the library aspect, I made several well behaving, documented, stand alone components.

What are your plans beyond this Summer of Code time frame for your proposed work?.

Given time, I'd like to integrate user feedback, if any. Also, when concepts (or rather, concept lite) become available, necessary type constraints should be added.

Please rate, from 0 to 5 (0 being no experience, 5 being expert), your knowledge of the following languages, technologies, or tools:

The following are rated as a skills of a student:

  • C++ 98/03: 2, It's been a time I worked with the '98 standard.
  • C++ 11/14: 4, I'm not afraid of using TMP, constexpr, noexcept or the other C++11 features but I have little knowledge of compiler internals (like LTO).
  • C++ Standard Library: 4, I embraced the ways of Effective STL, but the Standard Library is simply too large for one get to the expert level in a few years.
  • Boost C++ Libraries: 3, I worked with the following libraries: Algorithm, Asio, Atomic, Concept Check, Context, Coroutine, Filesystem, Foreach, Graph, Lexical Cast, Lockfree, Log, MPL, Preprocessor, Property Tree, Serialization, Smart Ptr, Static Assert, Test, Thread, Type Erasure, Variant. The others I have little or no experience with.
  • Git: 3, I use it daily and I am satisfied. However, performing subtle tricks require reading the documentation.

What software development environments are you most familiar with (Visual Studio, Eclipse, KDevelop, etc.)?

I use Eclipse CDT, version Kepler on Linux. My compilers of choice are GCC and Clang. I have a little experience with Visual Studio.

What software documentation tool are you most familiar with (Doxygen, DocBook, Quickbook, etc.)?

The GSoC'14 project Boost.Pipeline documentation makes use of Doxygen and Quickbook. I have the required understanding of both tools.

Project Proposal

Quouted from the Boost SoC page:

For certain applications the standard containers std::vector and std::deque are not too easy to work or may induce unnecessary overhead. For example, std::vector lacks an O(1) time push_front and std::deque allows you no way of controlling the size of the underlying segments. This limits their usability in certain contexts, especially if you want to have optimal performance.

I propose to write a full-fledged and enhanced C++11 versions of std::vector and std::deque with extra functionality that gives the programmer more control over buffers, segments and iteration. My focus is on the devector because of its popularity.

In case of devector, the main idea would be to have the allocated chunk filled from middle, while maintaining contiguous memory. This allows a cheap push_front, pop_front, push_back and pop_back operations. On reallocation, the ratio of [begin, store_middle) : [store_middle, end) could be taken into account when moving the previous items.

The deque (or segmented_devector) should be similar to std::queue, but with a dynamically specified segment size. To support batch operations (e.g: when using RDMA), a similar method should be provided:

iterator segment_end(iterator begin)

It makes possible to process a whole segment at once. It's not clear yet, how random insertion should be handled: Either by allocating a new segment, or moving the shorter of the [begin, pos) and [pos, end) ranges. The former is probably faster to execute but results in increased fragmentation, the latter is maybe slower but maintains a more compact memory. It's also possible to mix the two.

Proposed Milestones and Schedule

I propose the following development cycle: figure out the interface, create an initial implementation and tests, refine until tests pass and performance criteria is met. As learned in the last year, user guide is best to be done at the end of the development cycle, to avoid unnecessary work and missing of important details caused by too much familiarity with the code base.

I propose the following ratios:

  • 1/6 planning and experimenting with possible interface solutions
  • 1/3 implementing devector, creating tests (unit and performance)
  • 1/3 implementing deque, creating tests (unit and performance)
  • 1/6 writing documentation

Targets of milestones (dates indicate start of week):

  • 25 May: Study various vector implementations (libstdc++, libc++, Boost.Container) Find out how boost::vector trades the strong exception guarantees to performance. Decide on the exception guarantees.
  • 1 June: Create initial devector class template with basic tests
  • 8 June: Measure the performance and memory consumption of the implementation. Compare it to the other vectors. To measure: push_back, push_front, insert, pop_front, pop_back. Make sure the destructors of the removed elements are called and exception guarantees are met.
  • 15 June: Refine the implementation, change exception guarantees if necessary.
  • 22 June: Request feedback, implement convenience functions and other nice to have features. Spare time.
  • 6 July: Study various deque implementations, sketch the possible allocation strategies.
  • 13 July: Create initial deque class template with basic tests
  • 20 July: Measure the performance of each allocation strategies. If the internals turn out to be similar to boost::container::deque, consider reusing code if possible.
  • 27 July: Request feedback, implement convenience functions and other nice to have features.
  • 3 August: Request feedback, refine the implementation as necessary.
  • 10 August: Create documentation
  • 17 August: Finish documentation

It's desirable to ship a complete library of less features than creating an incomplete product. Because of the popularity of vector, implemenetation and documentation of devector has priority over deque.

Programming Competency

According to the rules described in the Boost SoC page, instead of the prescribed competency test, I'd like to prove my abilities by the following two projects:

  • Boost.Pipeline : GSoC'14 project
  • C++omppi : A slightly older, less relevant C++ application, which feeds interconnected plaintext files into a database.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment