Skip to content

Instantly share code, notes, and snippets.

@aaronpeikert
Last active February 14, 2020 17:32
Show Gist options
  • Save aaronpeikert/2682b0d5fadecd2d18fc38be98fcdf62 to your computer and use it in GitHub Desktop.
Save aaronpeikert/2682b0d5fadecd2d18fc38be98fcdf62 to your computer and use it in GitHub Desktop.

As promised, we get a little more practical and share our experience in creating "bulletproof" reproducible analyses. To that end, we get you started with RMarkdown, Git, Make and Docker. Even though we have the most experience in R, this workflow is perfectly compatible with other languages --- like Python or Julia --- and if there is interest, we go through how. We plan three sessions à two hours to tackle this but might spend an additional session on questions that arise as we go along.

  1. Get on track --- How to leverage literate programming and version control for reproducible data analysis projects. This first part of our series on reproducibility introduces you to Git & RMarkdown. A data project is seldom just code; RMarkdown helps you to combine the code, the thoughts that let to it and its results in an elegant report. You'll learn how to use Git to put such document at the centre of collaboration with your fellow data scientists (how to track files, create branches, forks and pull requests, etc.).
  2. Make it right --- How to prepare automated recipes to build reproducible pipelines. The second part of our series on reproducibility shows how you how to use GNU Make to manage complex dependencies between files and scripts. This know-how enables you to increase the computational efficiency of your data pipeline and allows your collaborators to reproduce your work quickly.
  3. Wrap it up --- How to ship all software dependencies of your project in a neat container. The third and last part of our series on reproducibility enables you to use containerization with Docker, thus giving you the last building block for a data analysis project that you can trust to reproduce. Containerization is a tool to bundle all software that you need into a portable format, called "image", that you can execute on all operating systems. Using an image/container will shield your analysis from accidentally breaking due to changes to the software environment and enable collaborators to run your analysis without installing or changing anything on their system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment