Skip to content

Instantly share code, notes, and snippets.

@lindsayrutter
Last active August 29, 2017 08:02
Show Gist options
  • Save lindsayrutter/e033b524971d3950bb02d8a128905423 to your computer and use it in GitHub Desktop.
Save lindsayrutter/e033b524971d3950bb02d8a128905423 to your computer and use it in GitHub Desktop.
bigPint - Google Summer of Code 2017

bigPint - Google summer of Code 2017

Team

Lindsay Rutter - Student software developer
Dianne Cook - Mentor
Roxane Legaie - Mentor

The project

The main goal of the project was to develop interactive visualization methods for large multivariate datasets. Specifically, we aimed to develop three main plotting types:

  • Scatterplot matrices
  • Parallel coordinate plots
  • Replicate line plots

We also aimed to test our plotting methods on RNA-sequencing datasets. A fuller description of the original goals of our project can be found on our project page on the R Project website for Google Summer of Code 2017.

Code

Code from the summer project used:

  • R
  • JavaScript
  • Plotly
  • Shiny

Commits

There were 74 commits from this summer project, which are all located in a GitHub repository called bigPint. The commits occured between June 16, 2017 and August 29, 2017. The last commit for the summer project has the message "Last GSOC commit". The authors plan to continue committing to this repository even after Google Summer of Code ends.

Work completed

We delivered many of the deliverables specified at the beginning of the summer. In total, we packaged 10 functions into our developing bigPint repository. The repository is structured like an R package, and these 10 functions can be found within the R directory:

  1. repLinePCP - Creates a replicate line plot using all combinations of samples between treatment groups. The user can upload metrics of interest (such as p-values or log fold changes), and can arrange these metrics by order, superimposing those values for a given observation from within the data. This is all linked to a parallel coordinate plot.
  2. scatMatFCPCP - Creates a scatterplot matrix where observations that are above a user-defined fold change are displayed. The user can select observations of interest, and see these overlaid as parallel coordinate plots superimposed onto side-by-side boxplots.
  3. scatMatHexPCP - Creates a scatterplot matrix where observations are grouped into hexagons. The user can select a hexagon of interest, and see the observations within it overlaid as parallel coordinate plots superimposed onto side-by-side boxplots.
  4. scatMatOrthPCP - Creates a scatterplot matrix where observations that are above a user-defined orthogonal distance are displayed. The user can select observations of interest, and see these overlaid as parallel coordinate plots superimposed onto side-by-side boxplots.
  5. scatMatPIPCP - Creates a scatterplot matrix where observations that are above a user-defined predicion interval are displayed. The user can select observations of interest, and see these overlaid as parallel coordinate plots superimposed onto side-by-side boxplots.
  6. selDelIntPCP - Creates a parallel coordinate plot where the user can box select an area, and any observations that fall within that box for any of the horiziontal-axis integers are deleted.
  7. selDelIntShadePCP - Creates a parallel coordinate plot where the user can box select an area, and any observations that fall within that box for any of the horiziontal-axis integers are deleted. A shaded box is superimposed where the user selected.
  8. selDelPCP - Creates a parallel coordinate plot where the user can box select an area, and any observations that fall within that box are deleted.
  9. selDelShadePCP - Creates a parallel coordinate plot where the user can box select an area, and any observations that fall within that box are deleted. A shaded box is superimposed where the user selected.
  10. selPCP - Creates a parallel coordinate plot where the user can box select an area, and any observations that fall within that box are highlighted.

These functions are documented and are part of the help function of the developing R package.

Work Left

We are still examining the visualization techniques with additional datasets. The dataset included in the developing R package comes from a soybean study of iron levels in soil. We have also been examining a honeybee study on nutrition and virus loads. After examining additional datasets, we also need to write our vignette for the developing R package.

Conclusion

It was an educational and interesting experience to participate in this open-source project along with other students worldwide working on their own open-source projects. We were able to gain more insight into the open-source community. On the technical side, one of the biggest challenges came from developing our methods using four languages (R, Plotly, Shiny, and JavaScript). There are certain components of our software for which we have still not reached our goals because the merging of the languages led to unexpected and undesirable results at times. One of the biggest advantages of this summer came from the ability to work with two mentors who provided useful feedback and allowed the work to be presented in front of their network of colleagues who have experience in these areas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment