- Name: Benedek Thaler
- University: Budapest University of Technology and Economics
- Course/Major: Computer Engineer, Applied Informatics
- Degree Program: M.Sc.
- Email: thalerbenedek -#- gmail.com
- Homepage: erenon.hu
- Availability: I'd like to spend 20 hours/week on GSoC until the end of June, 40 hours after then. I want to start as soon as possible and work through the summer. Until end of June work, after then one or two weeks of vacation affects my availability.
Please summarize your educational background (degrees earned, courses taken, etc.). Please summarize your programming background (OSS projects, internships, jobs, etc.).
Regarding my educational and professional background, please refer to my CV online. In short, I'm a Computer Engineer M.Sc student and library developer at Morgan Stanley.
Please tell us a little about your programming interests.
In the last few years, I developed C++ libraries. Currently, the focus of my interest is the area of low latency applications. Please see my other open projects on my GitHub page
Please tell us why you are interested in contributing to the Boost C++ Libraries.
I'm using Boost regularly in my projects and it's a constant source of inspiration. I like my projects open, because I believe this is the way of how software gets done. As I can't make my professional works open because of corporate regulations, I'd like to grab the opportunity of GSoC '15 to give back and contribute to the efforts of Boost.
What is your interest in the project you are proposing?
In the world of high frequency trading, being faster means a lot. To increase throughput
and reduce latency, a lot of techniques were developed lately, e.g: zero copy, various
kernel bypass technologies, RDMA. However, while providing very fast reading because
of the underlying contiguous memory, std::vector
makes random insertion and erasing a challenge.
I think the proposed data structures help this, especially the RDMA usecase.
Have you done any previous work in this area before or on similar projects?
I created several container-like objects before, however, probably nothing so refined as a STL container. Regarding the library aspect, I made several well behaving, documented, stand alone components.
What are your plans beyond this Summer of Code time frame for your proposed work?.
Given time, I'd like to integrate user feedback, if any. Also, when concepts (or rather, concept lite) become available, necessary type constraints should be added.
Please rate, from 0 to 5 (0 being no experience, 5 being expert), your knowledge of the following languages, technologies, or tools:
The following are rated as a skills of a student:
- C++ 98/03: 2, It's been a time I worked with the '98 standard.
- C++ 11/14: 4, I'm not afraid of using
TMP
,constexpr
,noexcept
or the otherC++11
features but I have little knowledge of compiler internals (like LTO). - C++ Standard Library: 4, I embraced the ways of Effective STL, but the Standard Library is simply too large for one get to the expert level in a few years.
- Boost C++ Libraries: 3, I worked with the following libraries: Algorithm, Asio, Atomic, Concept Check, Context, Coroutine, Filesystem, Foreach, Graph, Lexical Cast, Lockfree, Log, MPL, Preprocessor, Property Tree, Serialization, Smart Ptr, Static Assert, Test, Thread, Type Erasure, Variant. The others I have little or no experience with.
- Git: 3, I use it daily and I am satisfied. However, performing subtle tricks require reading the documentation.
What software development environments are you most familiar with (Visual Studio, Eclipse, KDevelop, etc.)?
I use Eclipse CDT, version Kepler on Linux. My compilers of choice are GCC and Clang. I have a little experience with Visual Studio.
What software documentation tool are you most familiar with (Doxygen, DocBook, Quickbook, etc.)?
The GSoC'14 project Boost.Pipeline documentation makes use of Doxygen and Quickbook. I have the required understanding of both tools.
Quouted from the Boost SoC page:
For certain applications the standard containers std::vector and std::deque are not too easy to work or may induce unnecessary overhead. For example, std::vector lacks an O(1) time push_front and std::deque allows you no way of controlling the size of the underlying segments. This limits their usability in certain contexts, especially if you want to have optimal performance.
I propose to write a full-fledged and enhanced C++11 versions of std::vector
and std::deque
with extra functionality that gives the programmer more control over buffers, segments and iteration.
My focus is on the devector
because of its popularity.
In case of devector
, the main idea would be to have the allocated chunk filled from middle,
while maintaining contiguous memory. This allows a cheap push_front, pop_front, push_back and pop_back
operations. On reallocation, the ratio of [begin, store_middle) : [store_middle, end)
could be taken into
account when moving the previous items.
The deque
(or segmented_devector
) should be similar to std::queue
, but with a dynamically specified
segment size. To support batch operations (e.g: when using RDMA), a similar method should be provided:
iterator segment_end(iterator begin)
It makes possible to process a whole segment at once. It's not clear yet, how random insertion should be
handled: Either by allocating a new segment, or moving the shorter of the [begin, pos)
and [pos, end)
ranges.
The former is probably faster to execute but results in increased fragmentation, the latter is maybe slower but
maintains a more compact memory. It's also possible to mix the two.
I propose the following development cycle: figure out the interface, create an initial implementation and tests, refine until tests pass and performance criteria is met. As learned in the last year, user guide is best to be done at the end of the development cycle, to avoid unnecessary work and missing of important details caused by too much familiarity with the code base.
I propose the following ratios:
- 1/6 planning and experimenting with possible interface solutions
- 1/3 implementing
devector
, creating tests (unit and performance) - 1/3 implementing
deque
, creating tests (unit and performance) - 1/6 writing documentation
Targets of milestones (dates indicate start of week):
- 25 May: Study various
vector
implementations (libstdc++, libc++, Boost.Container) Find out howboost::vector
trades the strong exception guarantees to performance. Decide on the exception guarantees. - 1 June: Create initial
devector
class template with basic tests - 8 June: Measure the performance and memory consumption of the implementation.
Compare it to the other vectors. To measure:
push_back
,push_front
,insert
,pop_front
,pop_back
. Make sure the destructors of the removed elements are called and exception guarantees are met. - 15 June: Refine the implementation, change exception guarantees if necessary.
- 22 June: Request feedback, implement convenience functions and other nice to have features. Spare time.
- 6 July: Study various
deque
implementations, sketch the possible allocation strategies. - 13 July: Create initial
deque
class template with basic tests - 20 July: Measure the performance of each allocation strategies.
If the internals turn out to be similar to
boost::container::deque
, consider reusing code if possible. - 27 July: Request feedback, implement convenience functions and other nice to have features.
- 3 August: Request feedback, refine the implementation as necessary.
- 10 August: Create documentation
- 17 August: Finish documentation
It's desirable to ship a complete library of less features than creating an incomplete product.
Because of the popularity of vector
, implemenetation and documentation of devector
has priority
over deque
.
According to the rules described in the Boost SoC page, instead of the prescribed competency test, I'd like to prove my abilities by the following two projects:
- Boost.Pipeline : GSoC'14 project
- C++omppi : A slightly older, less relevant C++ application, which feeds interconnected plaintext files into a database.