Skip to content

Instantly share code, notes, and snippets.

@al2na
Last active April 9, 2017 13:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save al2na/d4d4a62d80f9c7564e8a to your computer and use it in GitHub Desktop.
Save al2na/d4d4a62d80f9c7564e8a to your computer and use it in GitHub Desktop.
Master’s/Internship projects available at Bioinformatics Platform BIMSB/MDC

Master’s/Internship projects available at Bioinformatics Platform BIMSB/MDC

Akalin lab (http://bioinformatics.mdc-berlin.de) has multiple project themes available for Master’s or internship projects.

Research Software sustainability

Reproducibility of scientific workflows is a general problem across all fields of science including computation and data analysis heavy fields. For data analysis or computational work it is desirable to install the exact same version as published research software in order to enable reproduction of published data and controlled manipulation or augmentation of the software system. At MDC, we use GNU Guix for more than three years to build scientific software at different versions and variants, and to manage software environments in a reproducible fashion. We have also team members who are main contributors to GNU Guix project. We are looking for new members who can help improve our workflow. Our goal is to implement a system based on GNU Guix and Cuirass, by which we can build a wide range of scientific software continuously and automatically in a bit-for-bit reproducible fashion and offer the build results to Guix users.

Specific tasks

  • Develop Cuirass as a production quality platform for continuous integration (CI) with Guix.
  • Set up a public service to continuously build and deploy software, offer for download, and archive packages and package variants with Guix and Cuirass.
  • Investigate and patch sources of non-determinism in software packages offered through Guix.

What will you get out of this?

  • TBD
  • TBD
  • TBD
  • TBD

What do you need to know?

  • TBD
  • TBD
  • TBD

Other projects

1) Methods for DNA modification analysis

DNA methylation and other DNA modifications such as hydroxymethylation are implicated in gene regulation and their mis- regulation is shown to cause cancer. With the advent of then next- generation sequencing, measuring genome-wide DNA methylation levels became possible. However, this also created a demand for high-quality software for analysis of large-scale DNA methylation data sets. In this project, the aim is to help develop data processing, machine learning and statistical modeling tools for DNA methylation analysis to be integrated to our existing software methylKit (https://code.google.com/p/methylkit/)

Multiple sub-projects available

2) Methods for genomics data integration and visualization

Data integration and processing is a vital tool in genomics for knowledge discovery. The number of public datasets are increasing by the day thanks to multiple large consortiums producing genomics data sets, such as ENCODE, Roadmap Epigenomics and EU Blueprint. We are building data integration and visualization methods. One example is our genomation package (http://www.bioconductor.org/packages/devel/bioc/html/genomation .html). The aim of the projects in this theme is to further develop genomation or other unpublished packages adding new methods and increasing data processing and visualization capabilities.

Multiple sub-projects available

3) Pure data analysis projects

Our lab has broad interest in gene regulation and epigenomics. We have more data analysis oriented projects that require less method development but more data processing, integration and applied statistics.

Multiple sub-projects available

4) Developing bioinformatics tools and workflows for Galaxy

We are also aiming to integrate and develop BIMSB bioinformatics tools to Galaxy framework. These projects will include integrating tools with galaxy and making complete workflows where the user can interact through a web-browser.

Multiple sub-projects available

What do you need to know?

  • Knowledge of R. Experience with another programming language such as perl, python or C/C+ will be useful but not necessary. For some sub-projects, C/C++ will be absolutely necessary on top of R. In addition, previous coursework on statistics and programming will be helpful.
  • For galaxy related work, knowledge of web technologies and scripting languages such as Perl/Python.

What will you get out of this?

  • You will get a chance to work with genomics data sets
  • You will learn how to build and maintain an R package (depending on the project)
  • You will gain experience in applied statistical modeling (depending on the project)
  • You will gain experience in building tools for galaxy (depending on the project)
  • If your work is accepted as part of the software and/or ends up in a publication, you will be credited as contributor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment