Created
February 15, 2015 12:04
-
-
Save Martin-Jung/b0f9f14cdbea895dc6f0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Reproducible R coding" | |
author: "Martin Jung" | |
date: "12.02.2015" | |
output: | |
ioslides_presentation: | |
highlight: zenburn | |
highlighter: highlight.js | |
hitheme: zenburn | |
slidy_presentation: | |
incremental: no | |
subtitle: CMEC R-Group | |
--- | |
## Goals of reproducible programming? {.vcenter .build .larger .incremental} | |
- Make your code readible by you and others | |
- Group your code and functionalize | |
- Embrace collaboration, version control and automation | |
## First step - readibility {.vcenter} | |
### 1. Writing cleaner code | |
![Coding Mess](codingMess.jpg) | |
## Writing cleaner R code | Names { .emphasized .build} | |
- Keep new filenames descriptive and meaningful | |
```{r,results='hide'} | |
"helper-functions.R" | |
# or for sequences of processing work | |
"01_Download.R" | |
"02_Preprocessing.R" | |
#... | |
``` | |
- Use CamelCase or Snake_case for variables | |
```{r,results='hide'} | |
"spatial_data" | |
"ModelFit" | |
"regression.results" | |
``` | |
### Avoid predetermined names like `c` or `plot` | |
## Writing cleaner R code | Spacing {.emphasized .build} | |
Use Spacing just as in the english language | |
```{r,results='hide'} | |
# Good | |
model.fit <- lm(age ~ circumference, data = Orange) | |
# Bad | |
f1=lm(Orange$age~Orange$circumference) | |
``` | |
Don't be afraid of using new lines | |
```{r,results='hide'} | |
model.results <- data.frame(Type = sample(letters, 10), | |
Data = NA, | |
SampleSize = 10 ) | |
# Same goes for loops | |
# And don't forget good documentation | |
``` | |
## More on writing clean code {.flexbox .vcenter} | |
- [Google R Style Guide](https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml) | |
- [Hadley Wickhams Style Guide](http://adv-r.had.co.nz/Style.html) | |
- [RopenSci Guide](http://ropensci.github.io/reproducibility-guide/sections/writingCode/) | |
<br> | |
<div align="left"> | |
And there even is a r-package to clean up your code: | |
[formatR](http://yihui.name/formatR/) | |
</div> | |
## Further ways to improve reproduciability | |
- Ideally attach your code + data to publications | |
- Open-access hoster ([DataDryad](http://datadryad.org/), [Figshare](http://figshare.com/), [Zenodo](http://www.zenodo.org/)) | |
- Restructuring of workflow with RMarkdown / LaTeX / HTML | |
<div align="center"> | |
![Coding Mess](Knitr-document-structure.png) | |
</div> | |
## Functionalize! {.flexbox .vcenter .build} | |
- Many `R` users are tempted to write their code very specialized and non-reusable | |
- Number 1 rule for clear coding : | |
### ***DRY*** - `Don't repeat yourself!` | |
<br> | |
***Simple example:***<br> | |
We want to fit a linear model to test if in an | |
orange orchard the circumference (mm) increases with age (age of trees). | |
If so we want to quantify and display the Root-Mean-Square-Error (`RMSE`) of this fit for each | |
individual orange tree in the dataset (`N = 5`). | |
*** | |
Normal way: | |
```{r,results='hide'} | |
# Linear model | |
model.fit <- lm(age ~ circumference, data = Orange) | |
model.resid <- residuals( model.fit ) | |
model.fitted <- fitted( model.fit ) | |
rmse <- sqrt( mean( (model.resid - model.fitted)^2 )) | |
tapply(model.resid - model.fitted, Orange$Tree, | |
function(x) sqrt( mean( (x)^2 ))) | |
``` | |
*** | |
```{r,echo=FALSE} | |
barplot( tapply(model.resid - model.fitted, Orange$Tree, function(x) sqrt( mean( x^2 ))) ) | |
``` | |
## Defining your functions {.build} | |
Essentially most r-packages are just a compilation of useful functions that users have written. | |
```{r,results='hide'} | |
# We want to get the RMSE of a linear model | |
rmse <- function(fit, groups = NULL, ...) | |
{ | |
f.resid <- residuals(fit);f.fitted <- fitted(fit) | |
if(! is.null( groups )) { | |
tapply((f.resid-f.fitted), groups, function(x) sqrt(mean(x^2, ...)) ) | |
} else { | |
sqrt(mean((f.resid-f.fitted)^2, ...)) | |
} | |
} | |
``` | |
--- | |
```{r} | |
model.fit <- lm(age ~ circumference, data = Orange) | |
# This function is more flexible, can be further customized and | |
# applied in other situations | |
rmse(model.fit) | |
rmse(model.fit, Orange$Tree) | |
``` | |
## (very) short intro into pipes | |
Pipes (|) are a common tool in the linux / programming world that can be used to chain | |
inputs and outputs of functions together. | |
<br> | |
In `R` there are two packages, namely `dplyr` and `magrittr` that enable general piping between all functions | |
Goal: | |
``` | |
Solve complex problems by combining simple pieces | |
(Hadley Wickham) | |
``` | |
*** | |
```{r,tidy=FALSE,message=FALSE,results='hide',fig.show='hide'} | |
library(dplyr) | |
model.rmse <- Orange %>% | |
lm(age ~ circumference, data=.) %>% | |
rmse(., Orange$Tree) %>% | |
barplot | |
``` | |
OR like this (Correlation within Iris dataset) | |
```{r,tidy=FALSE} | |
iris %>% group_by(Species) %>% | |
summarize(count = n(), pear_r = cor(Sepal.Length, Petal.Length)) %>% | |
arrange(desc(pear_r)) | |
``` | |
## Outsource your functions {.flexbox .vcenter} | |
```{r,results='hide'} | |
# Put your function into an extra files | |
# At the beginning of your main processing script | |
# you simply load them via source | |
source("outsourced.rmse.R") | |
``` | |
## Easy package writing {.flexbox .vcenter} | |
- Open RStudio | |
- Install the `devtools` and `roxygen2` package | |
- Create a new package project and use the existing function as basis | |
- Create the documentation for it | |
- Update the package metadata and build your package | |
```{r,results='hide',eval=FALSE} | |
library(roxygen2) | |
library(devtools) | |
# Build your package with two simple commands | |
# Has to be within your package project | |
document() # Update the namespace | |
install() # Install.package | |
``` | |
## {.flexbox .vcenter} | |
- However package development has multiple facets and options. | |
- More detailed info on [Package development with RStudio](https://support.rstudio.com/hc/en-us/sections/200130627-Package-Development). | |
<br> | |
- Higher acceptance for method papers and analysis code. [Make it citable with a DOI](https://guides.github.com/activities/citable-code/) | |
## Software management and collaboration with Github {.flexbox } | |
- Git is one of the most commonly used revision control systems | |
- Originally developed for the Linux kernel by Linus Torvalds | |
*** | |
![How to Git](Git_operations.png) | |
*** | |
> Github is web-based software repository service offering distributed revision control | |
> Californian Startup, now the largest code hoster in the world | |
> Offers public repositories for free, private for money and a nice snippet exchange service called gists | |
<div align="right"> | |
![Github](github-logo.jpg) | |
</div> | |
## How to Git with rstudio (do it later) | |
1. Setup an account with a git repository hoster like [Github](https://github.com/) | |
2. Install RStudio and git for your platform (http://www.rstudio.com/ide/docs/version_control/overview) | |
3. Link to the git executable within the RStudio options | |
4. Create a new repository on Github and a new project in RStudio -> Version Control git | |
5. Clone your empty project (`pull`), add new files/changes to it (`commit`) and (`push`) | |
## {.flexbox .vcenter} | |
<div align="center"> | |
![Github](createdRepo.png) | |
</div> | |
<b> Idea for CMEC R Users: </b> | |
- Create a Github organization (like a repository basecamp) | |
## Further developments | |
There are now packages to push gists and normal git updates directly from within `R`. | |
In order to use them you need a github api key (instructions on the websites below) | |
[rgithub](https://github.com/cscheid/rgithub) | |
To detailed to show here, but have a look at the `gistr` package: | |
[gistr](https://github.com/ropensci/gistr) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment