erenon/BoostPipelineProp.md

## BoostPipelineProp.md

      
    Raw
  

              BoostPipelineProp.md
            
          
    GSoC 2014 Boost.Pipeline Proposal

Personal Details


Name: Benedek Thaler
University: Budapest University of Technology and Economics
Course/Major: Computer Engineer, Applied Informatics
Degree Program: M.Sc.
Email: thalerbenedek -#- gmail.com
Homepage: erenon.hu
Availability: I'd like to spend 20 hours/week on GSoC until the end of June, 40 hours after then.
I want to start as soon as possible and work through the summer. Until end of June work, after
then one or two weeks of vacation affects my availability.
Supplement: Boost.Pipeline prototype A working example of a simple pipeline
to demonstrate coding skills

Background Information

Please summarize your educational background (degrees earned, courses taken, etc.).
Please summarize your programming background (OSS projects, internships, jobs, etc.).
Regarding my educational and professional background, please refer to my CV online.
In short, I'm a Computer Engineer M.Sc student and library developer at Morgan Stanley.
Please tell us a little about your programming interests.
Please tell us why you are interested in contributing to the Boost C++ Libraries.
By being a library developer I face concurrency issues on a daily basis. I find implementing
parallel algorithms an interesting challenge, but the maintenance of concurrent or distributed
enterprise applications is rather a burden sometimes. I think the n3534 proposal is a
good opportunity to rethink our problems and find solutions by looking at them from a different
point of view.
I'm using Boost regularly in my projects and it's also a constant source of inspiration.
I feel comfortable if my project is open source. As I can't make my professional works
open because of corporate regulations, I'd like to grab the chance of GSoC '14
to give back and contribute to the efforts of Boost.
Regarding the library aspect I have contributed to multiple proprietary projects which I can't show.
What is your interest in the project you are proposing?
Aside the aforementioned opportunity to employ concurrency in an unusual way (in C++),
I welcome the chance to create a library I'm proud of, a piece of work which properly
demonstrates my programming skills.
Have you done any previous work in this area before or on similar projects?
The Boost.Pipeline idea grabbed my attention because it resembles to the message passing
of Erlang or QNX (and of course, it's exactly like the Unix pipeline).
These systems are extremely scalable and have much less of the common concurrency issues
(like race condition). I've already experimented with Erlang solutions, it might come in handy.
What are your plans beyond this Summer of Code time frame for your proposed work?.
I definitely would like to continue Boost.Pipeline as a hobby project if it makes into the finals,
however, if the project receives considerable attention, I'm willing regard it as
something more serious.
Please rate, from 0 to 5 (0 being no experience, 5 being expert), your knowledge of the following languages, technologies, or tools:
I'm a university student and colleague of Bjarne Stroustrup at the same time -- it's hard to rate my C++ skills honestly.
In the following I rate myself as a student, however, instead of checking raw numbers, why don't you take a look
at my Boost.Pipeline prototype to get a clear view?

C++: 4, I'm not afraid of using TMP, constexpr or the other C++11 features but I have little knowledge of compiler internals (like LTO).
C++ Standard Library: 4, I've embraced the ways of Effective STL, but the Standard Library is simply too large for one get to the expert level in a few years.
Boost C++ Libraries: 3, I worked with the following libraries: Algorithm, Asio, Atomic, Concept Check, Filesystem, Foreach, Graph, Lexical Cast, Lockfree, Log, MPL, Preprocessor, Property Map, Smart Ptr, Test.
The others I have little or no experience with.
Subversion: 2, I would rather use Git instead (it's a 4 then, I have a solid knowledge).

What software development environments are you most familiar with (Visual Studio, Eclipse, KDevelop, etc.)?
I'm using Eclipse CDT, version Kepler on Linux. My compilers of choice are GCC and Clang.
I have a little experience with Visual Studio.
What software documentation tool are you most familiar with (Doxygen, DocBook, Quickbook, etc.)?
The Boost.Pipeline prototype makes use of Doxygen and Quickbook.
The generated output is available online (nothing serious, it's a proof of concept only).
I have the required understanding of both tools.
Project Proposal

The motivation and intended interface is already outlined in n3534. Instead of pasting it here,
I would rather outline the differences (also considering the implementation of Google):

I'd like to take advantage of the new C++11 features, e.g: functional, thread or atomic.
I'd like to make it work on ranges rather than containers.
I'd like to use less or no dynamic memory to construct pipelines, and use value semantics instead
to ensure clear representation of ownership.
I'd like the queue type deduced from the transformator functions argument,
and use the boost::lockfree::spsc_queue as a default if the transformator is queue agnostic.
I'd like to make the parallel interface compatible with boost::threads executors.
I'd like to tune the library and proof it's worth by performance tests. Such tests will measure
the runtime difference of an appropriate task executed with and without the pipeline. These tests
also enable tuning by looking at the generated code.
I'd like to create an excellent documentation.
I'd like to comply to Boost coding standards.

Because of the short time frame I intend to focus on creating authentic C++11 code instead of
making this as portable as possible. C++98 compatibility is not a goal of this GSoC project.
Proposed Milestones and Schedule

Considering the advice of Fred Brooks and taking the already present interface proposal into account,
I propose the following ratios:

1/6 planning and experimenting with possible interface solutions
1/3 coding, creating unittests and integrations tests
1/3 writing documentation, and creating performance measurements
1/6 ensure msvc compatibility

The last three tasks should be scheduled parallel. Milestones are considered met if the target
concept has a working implementation, passing tests, suitable documentation and performance indicators.
A performance indicator is a test which shows how the pipeline implementation compares to a regular one
doing the same transformations.
Targets of milestones (dates indicate start of week):

Before 21 April: Develop prototype, experiment with different solutions
21 April: Formalize plans and high level integration tests (or examples)
28 April: Create prototypes to validate plans
5  May: Finish planning, polish details based on prototype results
12 May: Create working pipeline using 1-1 transformations. This and the following two items may overlap.
It's hard to get one right without thinking about the others.
19 May: Create pipeline of 1-N transformations.
26 May: Create pipeline of N-M transformations.
2 June: Initial attempt to do parallelization, each segment runs in separate thread.
9 June: Finish parallelization and fix possible queue issues.
16 June: Because of final exams, a delay is expected here. Spare time.
23 June: Midterm evaluations. Catch up with boost::threads executors.
30 June: Continue thread pool work.
7  July: (Around here, vacation is expected)
14 July: Finish simple parallel execution, also ensure msvc compatibility so far
21 July: Spare time. If nothing to do, start implementing parallel pipelines.
28 July: Continue work on parallel pipelines.
4 August: Polish API documentation, clean up tests, create user guide.
11 August: Finish documentation and user guide.

It's desirable to ship a complete library of less features than creating an incomplete product.
If the proposed schedule can't be met, parallel pipelines should be dropped first.
(Not the parallel execution of segments)