Skip to content

Instantly share code, notes, and snippets.

@domitry
Last active August 29, 2015 14:00
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save domitry/1b0ef948cebc11d2eafe to your computer and use it in GitHub Desktop.
Save domitry/1b0ef948cebc11d2eafe to your computer and use it in GitHub Desktop.
GSoC 2014 proposal version 3

SciRuby D3 project

Introduction

We have come up with an approach for the SciRuby D3 GSoC. Basically we decided that for interactive visualisations generating Javascript from Ruby is not the way forward. The clue is already in the D3 name, i.e., data driven documents ;)

What we are proposing is to create the visualisations as pure Javascript components. Components are driven by data in the form of JSON. Components are tied together also as data. For this we can use Vega, see

https://github.com/trifacta/vega/wiki/Tutorial

where the first bar chart is compiled of scales, axes, ... components. We see Vega as a componentization language which is used in plotrb to create basic visualisations.

Vega is great that it generates a wide range of common visualisations, starting from data.

When it comes to new interactivity the situation gets a bit more complicated. The most exciting visualisations are all written in D3/JS, see http://d3js.org/ and for example

http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html?_r=0

One original idea was to generate Javascript, as is done in this Python library named Bokeh:

http://bokeh.pydata.org/docs/gallery/elements.html

If you look at the Python code you can see a straight mapping from Python to JavaScript - this is similar to what some R packages do. There is, however, no real abstraction of filters and fetch data functions. Also wiring components together needs to be hard-wired for every visualisation (say, click one one pane updates the other). It is possible to pass in (anonymous) Javascript functions to components. These functions may be able to filter and even fetch data from other data sources and 'rewire' user actions.

The more we think about it, however, the more ridiculous a Ruby to Javascript mapping becomes, especially when you realise how easy it is to write D3/JS.

For flexibility it is much nicer to program directly in Javascript.

So, how can we help SciRuby users?

We think we should create/use visualisations in D3/JS and make it as easy as possible to compile them together using Vega-style components. For static data the approach of plotrb/vega is rather a good choice (the SciRuby D3 GSoC project by Zuhao has exactly achieved this last year).

The next step in interactivity is fetching 'live' data from other sources. We propose to allow the visualisations to fetch data over a REST interface using JSON. In fact, Pjotr has created such an example fetching data for a scatter plot from such a source with

https://github.com/pjotrp/course3/blob/master/example.html

You can even update the backend (REST put) from the browser. One example

https://github.com/pjotrp/browsertest/blob/master/couchdb/update_record.html

This means we can create visualisations which fetch and filter data on the fly from any REST server. So, setting up SciRuby as a back-end REST server becomes the main interface to D3 visualisations. The browser-side data fetcher and filter are merely components of the new D3/JS library.

When it comes to user interactivity - click something, do something - that is now completely solvable on the D3/JS side. All that is driven by data, and data is passed in through the REST interface. Even updates from the client to the server are possible.

A real advantage is that we can take existing visualisations without problems, and when we create interesting visualisations they can immediately go into the main D3/JS repositories. We believe that will accelerate access to cool stuff.

Obviously the REST interfaces can be delivered by other languages and servers, which should make the D3/JS visualisations useful to others. Being a bioinformatician we live in a world where people use R, Python, Perl, etc. It makes sense to create visualisations that can be shared.

Compared to current interfaces with standard D3 modules, we are going to add the REST interface as an optional component. We are going to expand on the VEGA style composition to add new funcionality. We are going to create new visualisations to abstract and test components from Ruby.

Once these RESTful visualisations exist in D3/JS it becomes crucial to make them really easy to use from (Sci)Ruby. Working from visualisations backwards we will find out how we can abstract D3/JS components, how we can compose visualisations over a VEGA style JSON (and maybe include in VEGA), how we can create a Ruby module and/or extend on plotrb.

Features

Whats types of plots do we need?

We have decided to create plots that can be divided into some groups. This is the temporary plan, and other ideas are welcome. Whatever visualisations we opt for this GSoC there should be a direct customer - someone who is going to use and test them as they are created. At the moment, we are planning to create three groups of plots below:

  1. Plots for general usage

General plots like scatter, box, bar, pie, area, line plots.

  1. Plots for Bioinformatics

We have come up with two charts on Bioinformatics below. The number of examples below is too few to be compiled into one library, but we will decide to create more plots during the first half of GSoC term.

  1. 3D Plots

We already have 3D plots library using WebGL. (https://github.com/domitry/elegans) I will write the Ruby wrapper for this library, and enable to generate plots below: Surface, Wireframe, Line, Particles and Scatter.

For each plot shown above, we will add interactive features below:

  1. General interactive features like like tool-tip, pan, wheel-zoom.

    (example: http://bokeh.pydata.org/docs/gallery/image.html)

    In this example of Bokeh, we have to click bottons before doing wheel-zoom or pan. We will implement these features more naturally.

  2. Dynamically connecting panes and update one from the other.

    (example: http://www.biostat.wisc.edu/~kbroman/D3/lod_by_time/)

    we would like to generalise this idea, e.g. have a scatter plot in one pane and boxplot in the other and update them. Also add 'slides' to set parameters for plotting.

  3. Show and zoom data tied to a horizontal 'geography' in a sliding window. In biology this is known as a genome browser.

    (example: http://jbrowse.org/code/JBrowse-1.11.3/?loc=ctgA%3A9901..32096&tracks=DNA%2CTranscript%2Cvolvox_microarray_bw_density%2Cvolvox_microarray_bw_xyplot%2Cvolvox-sorted-vcf%2Cvolvox-sorted_bam_coverage%2Cvolvox-sorted_bam&data=sample_data%2Fjson%2Fvolvox&highlight=&tracklist=0 )

    where visualisations (read components) are linked to a chromosome. We do not want to write a full genome browser though. We want to create a simple representation of something that is linked to a horizontal bar by using D3 components.

  4. Data importing with Ajax

(example: http://www.biostat.wisc.edu/~kbroman/D3/lod_by_time/)

Each plot will be able to import data from REST-server.

How can we handle interactivity?

Consider interactive plots that consist of multiple panes connected with each other.

(See examples of Bokeh: https://www.youtube.com/watch?v=kPknmEwQ3Rc)

We are planning to create interactive plots by sharing data among panes. See the figure below:

alt text

For generating the scatter plot, we need 'X' and 'Y' column. On the other hand, the box plot need 'max', 'min', and 'mean' column. These two plots have the common indexes, and we can make them interactive. For example, when you hover over points on the scatter plot, a small tool-tip that has index number will appear. Then click points, box plot will be updated according to values in 'max', 'min', and 'mean' column.

On the client-side (JS-side on Browser), the data is handled as a JSON object that is embeded in html file or dynamically loaded with Ajax. On the Ruby side (like IRuby) users can load and process data through a instance of SciRuby::Dataframe. We can also push the object into REST-server, and database on the server will save it as a table. See also the mock code (https://gist.github.com/domitry/0c2bd2997bc94b4e6f2a).

Structure

Client side JavaScript

We are planing to create JS library as a backend of Ruby library. The library will be separated into two parts, 'Plot-Base' and its plug-in libraries. See the figure below:

alt text

Plot-Base will parse JSON object loaded with Ajax or embeded in static html file. After parsing JSON, it divide given table data into some column and pass them to each plug-in. If multiple panes have common indexes, Plot-Base will tell each library what panes to update and how to update them. See the figure below:

alt text

I will implement 5-10 plots for each plug-in library. We are going to implement plots libraries for general usage and Biology, but will provide low-level APIs for 2D plotting (e.g. rect, circle, dot, ...) and compile them into 1 library as 2D-Base. 2D APIs will make it easy to implement new plots libraries like Math or Physics. We will also provide JS interfaces for VEGA (http://trifacta.github.io/vega/) and 3D plots.

Client side Ruby

I will implement some gems as a front-end of JS libraries. At the moment we are planning to provide three types of gems below:

  1. The base gem for plotting (the wrapper of JS Plots-Base)

    Base gem is tentatively named 'SciPlot.' This gem handle various data source like SciRuby::Dataframe or Array, and generate JSON object which will be passed into JS libraries. SciPlot also has outputting functions like rendering on IRuby or generate static html file.

  2. Ruby gem for each JS plotting library (e.g. BioPlot, GeneralPlot, 3D plot)

    We will also implement wrapper gems for each JS plotting libraries. These gems has its own module-name and classes for each type of plots JS library can generate. They will generate JSON object and do out-putting through APIs of SciPlot.

  3. SciRuby::Dataframe

    We will implement the simple version of SciRuby::Dataframe for Ruby wrappers above and REST-server. We will provide this as an individual gem like Pandas of Python, but we can probably implement only a little number of APIs during GSoC term. This will become a problem in the future.

Server side

SciRuby-Server consists of 4 parts: Server, REST API, Web interface and DataBase. I will implement it using Sinatra and non-SQL Database. I will also use Thin for testing server in local environment. We can deploy it into www-server (like httpd or nginx), or publish to closed network, or use in local environment (on Thin).

alt text

Timeline

  • Milestone 1: implementation of interactive plots in JavaScript (2-3 weeks)
    • Decide what plots to put into our project, and implement them.
    • Implement plots for general use and for Biology.
    • Add interactive features to created plots.
  • Milestone 2: compiling JS plots into one library and deciding the format of JSON object for plotting (1 week)
    • Pick up common components from JS plots created in Milestone 1.
    • Implement Plot-Base and compile JS plots into 2 or 3 individual libraries.
  • Milestone 3: implementation of Ruby wrappers and SciRuby::Dataframe (2-3 weeks)
    • Implement Ruby wrappers for each JS library created in Milestone 2.
    • Implement SciRuby::Dataframe.
  • Milestone 4: implementation of REST server (1-2 weeks)
    • Implement REST APIs for DataBase.
    • Implement web interface for REST APIs experimentally.

Conclusion

The final deliverable of GSoC 2014 will be

  1. A number of interactive RESTful visualisations in D3/JS
  2. A number of RESTful 3D visualisations in WebGL (https://github.com/domitry/elegans)
  3. A clear JSON interface for composing and driving visualisations
  4. A Ruby library for generating the required JSON
  5. A SciRuby RESTful server
  6. A NoSQL RESTful server as an intermediate

In addition we will toy with ideas and abstractions and maybe come up with an approach that can be copied by others. Mentors for this project are Pjotr Prins, John Woods and Prof. Karl Broman.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment