chrismedrela/proposal.md

## proposal.md

      
    Raw
  

              proposal.md
            
          
    "Improving numerical routines in Scala Breeze" GSoC 2014 proposal.

Abstract

Breeze is a great numerical processing library. However, it lacks some
high-level functions that you can find in other libraries like
SciPy. The second issue is that Breeze lacks
documentation. This makes the entry barrier higher for new contributors.
My proposal is to revamp documentation and to introduce interpolation and
integration facilities.
My main principle will be to lower entry barrier as much as possible and to
get people excited about Breeze so that it will gain a lot of new
contributors. In my opinion, exceeding the critical mass is the most important
thing at this moment.
Improving documentation

At the beginning, I'm going to improve the documentation of existing Breeze
modules and then revamp the tutorial. This is a chance for me to get into
internals of Breeze. I'm going to spend at this step about 3-4 weeks.
I believe that good documentation consists of three components:

Step-by-step tutorial shows the key concepts (like vectors and matrixes);
shows how Breeze "feels"; should be quick and easy.
Topical guides are for those, who have read the tutorial. In Breeze
there is a natural one-to-one relation between modules and topics, so this
is equivalent to high-level module documentation.
Low-level deep-dive reference included at the end of topical guides.

If time permits, I will investigate if it's possible to treat code snippets in
documentation as doctests. Introducing doctests means that we won't worry any
more about out-of-date snippets.
The main breeze page

The main page, that is, the first page users can see, should be as short as
possible -- everything, that can be moved to other pages (e.g. installation
and contributors), will be moved out. People don't have time to read
negligible details, first they need to know if Breeze is what they are looking
for and to get excited about the project.
The first paragraph should contain the most important information like:

what Breeze is;
why you should care about it;
and why it's worth the effort to learn it;
what license Breeze uses;
what the latest release is;
what are the other components (breeze-viz, breeze-learn, breeze-process).

The second paragraph will be a bunch of links. Everything should be simply
accessible, the best are two steps -- go to home page and find an appropriate
link. There wouldn't be too much links (installation, contribution; link to
tutorial and full documentation; bug tracker and source code -- both link to
github).
The next paragraph should show the power of Breeze and get people excited. So
a superb simple way to play with Breeze is a must:
$ sbt
set libraryDependencies ++= Seq("org.scalanlp" % "breeze_2.10" % "0.7-SNAPSHOT")
set libraryDependencies ++= Seq("org.scalanlp" % "breeze-natives_2.10" % "0.7-SNAPSHOT")
set resolvers ++= Seq("Sonatype Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/")
set resolvers ++= Seq("Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/")
set scalaVersion := "2.10.3"
console

And then, we will show the most amazing thing you can do in Breeze in little
code. It will consists mostly of code examples, no deep description. The first
impression is very important. At the end, there will be a link to the
tutorial.
Switching to Github Pages and Jekyll

The current documentation consists of wiki pages on github. This causes two
problems. First of all, they are separated repositores. There are two
different workflows to work at documentation and code. There is no association
between documentation and code and you don't know which code version is a doc
version about.
The second problem is that people would get more excited if breeze had it's
own webpage instead of using github wiki pages. There is an initial movement
for breeze and epic (see scalanlp webpage). My
goal would be to enhance it and to move all documentation to this site.
Github Pages is a good choice because it's free and is integrated with Jekyll.
That means that documentation html pages will be generated directly from
Markdown files. It's also very easy to use this technology.
Improving numerical routines

After revamping the existing documentation I will focus on the main part of
this proposal -- introducing interpolation and integration modules.
My goal is not to implement all possible facilities. Instead, for each family
of algorithms I will implement only one and design an interface that all
algorithms from that family must fulfill. Good documentation will lower the
entry barrier. I find it better in the long term because lowering the barrier
will attract more contributors which is the most important thing at this
moment rather than completeness. The new contributors will implement other
algorithms.
I'm going to write documentation and/or tests before implementation so that
other people can see how the API will look like and can comment and discuss
it.
This project is not risky at all. The code can be merged with the master
branch after implementing every family of algorithm. I find iterative approach
very suitable for this project.
I'm working at implementing linear interpolation, so you can "feel" what
I'd like to do this summer.
Brief plan

I will start from implementing interpolation. Univariate linear interpolation
is already in progress. Then I will implement 1d splines with degree equal or
smaller than 3. After that, I will move to multivariate interpolation and I
will implement both n-d linear interpolation as well as 2-d splines (with
degree <= 3). If time permits, I will also implement other interpolators like
barycentric and krogh ones.
The rest of time I will focus on integration. Again, I will start from single
integral. I will implement trapezoid and Simpson methods with equidistant
nodes. Then, I will move to n-d integrals and implement Monte Carlo method.
If time permits, I will also focus on enhancing existing modules like signal
processing, optimization and statistical functions.
About me

My name is Christopher Mędrela and I'm a student of University of Science and
Technology in Kraków (Poland). My time zone is UTC+01:00. My email address is
chris.medrela+gsoc2014 at gmail.com. I have [a github account]
(https://github.com/chrismedrela).
I'm contributor of open source projects since 2011. I'm working mainly at
Django. I've written a lot of patches. Last year I
was participating GSoC and I've successfully revamped Django check framework
(proposal, merge).
I'm fluent in Python so I can easily comprehend SciPy. I'm interested in other
languages too. I met Scala about one year ago. Before switching to Python, I
was coding in Java, so Scala is not a completely new language for me. I'm
familiar with Scala enough to manage this project and the branch where I'm
working at implementing linear interpolation proves that.
I can use the tools necessary to manage this project. During the last GSoC
I've mastered git. This proposal is written in Markdown, so I get started with
it. I know the basics of sbt, otherwise I couldn't write the pull request.
During the last GSoC it turned out that my English is good enough to talk in
real time although it's not fluent.
I'd like to internally shift GSoC dates to start on 21 April (one month
earlier) and finish after 12 weeks. Google said "We don't police what you
deliver to your org and when, simply that you meet the milestones of the
program as laid out.". Since we will make everything earlier, deadlines
are not a problem. David Hall doesn't object to that too. The reason for
the shifting is that I'd like to have an internship in the late summer and
this is the only way I can avoid a clash with GSoC.
During the GSoC, I'm not going to have any job, holidays nor any other
time-consuming activity except for classes at university. I'm not going to
apply for those internship which will collide with GSoC. I'd like to reserve
one week for preparing to exams (23 - 27 June), so I will be able to work at
GSoC for eleven weeks.