Skip to content

Instantly share code, notes, and snippets.

@sebkopf
Last active March 30, 2020 03:47
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save sebkopf/7caffdd8b299ed73914a to your computer and use it in GitHub Desktop.
Save sebkopf/7caffdd8b299ed73914a to your computer and use it in GitHub Desktop.
R markdown tutorial

R markdown and Data Frame Viewer tutorial

This tutorial provides an introduction to R markdown and basic data processing (import, data structuring & plotting) in R. You can download the whole folder by clicking the Download ZIP button above on the right.

Prerequisites:

  • install R and RStudio

Included:

  • introduction to markdown format
    • open the markdown tutorial file (rmd_tutorial.Rmd) in RStudio
  • introduction to R markdown with an example analysis
    • open the analysis file in RStudio (analysis.Rmd, make sure you have example.xlsx in the same folder)
  • introduction to Data Frame Viewer (included in rmd_tutorial.Rmd)
---
title: "Analysis test"
output: html_document
---
```{r, echo=FALSE, warning=FALSE}
# This code chunk simply makes sure that all the libraries used here are installed, it will not be shown in the report (notice echo = FALSE).
packages <- c("readxl", "knitr", "tidyr", "dplyr", "ggplot2", "plotly")
if ( length(missing_pkgs <- setdiff(packages, rownames(installed.packages()))) > 0) {
message("Installing missing package(s): ", paste(missing_pkgs, collapse = ", "))
install.packages(missing_pkgs)
}
```
This is a simple example analysis of data including import from Excel, data structuring and plotting. The data in this case happens to be optical density data over time (replicate growth curves for a microorganism) but the nature of the data matters little to the basics introduced.
## Import OD data
```{r}
library(readxl) # fast excel reader
#library(googlesheets) # fast google spreadsheet reader (not used here but could be useful)
data.raw <- read_excel("example.xlsx", skip = 1)
```
#### Show the raw data
```{r}
library(knitr) # the package that renders R markdown and has some good additional functionality
kable(data.raw)
```
### Restructuring the data
Turning the wide format excel data into *long* format. Note: here we make use of the pipe operator `%>%`, which just simplifies chaining operations.
```{r}
library(tidyr) # for restructuring data very easily
data.long <- data.raw %>% gather(sample, OD600, -Time)
# melt <- gather(raw, sample, OD600, -Time) # this would be identical without using %>%
```
Introducing time in hours.
```{r}
library(dplyr, warn.conflicts = FALSE) # powerful for doing calculations on data (by group, etc.)
data.long <- data.long %>% mutate(time.hrs = as.numeric(Time - Time[1], units = "hours"))
```
First plot of all the data
```{r}
library(ggplot2) # powerful plotting package for aesthetics driven plotting
p1 <-
ggplot(data.long) + # initiate plot
aes(x = time.hrs, y = OD600, color = sample) + # setup aesthetic mappings
geom_point(size = 5) # add points to plot
print(p1) # output plot
```
### Combining data by adding sample meta information from the spreadsheet's second tab
```{r}
data.info <- read_excel("example.xlsx", sheet = "info")
```
Show all information (these are the experimental conditions for each sample)
```{r}
kable(data.info)
```
Combine OD data with sample information.
```{r}
data.all <- merge(data.long, data.info, by = "sample")
```
### Show us the datas
Reuse same plot using `%+%` to substitute the original data set with a new one and changing the color to be determined based on the new information we added (but keep everything else about the plot the same).
```{r}
p1 %+% data.all %+% aes(color = substrate)
```
### Summarize data
To make the figure a little bit easier to navigate, we're going to summarize the data for each condition (combine the replicates) and replot it with an error band showing the whole range of data points for each condition. We could reuse the plot `p1` again, but for clarity are constructing the plot from scratch instead.
```{r}
data.sum <- data.all %>%
group_by(time.hrs, substrate) %>%
summarize(
OD600.avg = mean(OD600),
OD600.min = min(OD600),
OD600.max = max(OD600))
data.sum %>% head() %>% kable() # show the first couple of lines
p2 <- ggplot(data.sum) + # initiate plot
aes(x = time.hrs, y = OD600.avg, ymin = OD600.min, ymax = OD600.max,
fill = substrate) + # setup global aesthetic mappings
geom_ribbon(alpha = 0.3) + # value range (uses ymin and ymax, and fill for color)
geom_line() + # connect averages (uses y)
geom_point(shape = 21, size = 5) + # add points for averages (uses y and fill for color)
theme_bw() + # style plot
labs(title = "My plot", x = "Time [h]", y = "OD600", color = "Condition") # add labels
print(p2)
```
*Note that we could also have had ggplot do the whole statistical summarising for us using `stat_summary` but it's often helpful to have these values separately for other calcluations and purposes.*
Now could e.g. focus on a subset of data but reuse same plot using `%+%` to substitute the original data set with a new one (but keep everythign else about the plot the same).
```{r}
p2 %+% filter(data.sum, !grepl("background", substrate), time.hrs < 25)
```
Save this plot automatically as pdf by setting specific plot options in the r code chunk
```{r this-is-my-plot, dev="pdf", fig.width=7, fig.height=5, fig.path="./"}
print(p2)
```
#### Interactive plot
Last, you can make simple interactive (javascript) plots out of your original ggplots (plotly does not yet work great for all ggplot features but it's a start for easy visualization). You can of course construct plotly plots without ggplot for more customization too but that's for another time.
```{r}
library(plotly, warn.conflicts = FALSE)
ggplotly(p1)
```
---
output: html_document
---
# R markdown and Data Frame Viewer tutorial
This tutorial provides an introduction to [R markdown](http://rmarkdown.rstudio.com/)
and the [Data Frame Viewer](https://github.com/sebkopf/dfv#dfv).
## Markdown
**Markdown** is a very basic and easy-to-use syntax for styling written documents.
It's very easy to make some words **bold** and other words *italic* with Markdown.
You can even [link to NCBI](http://www.ncbi.nlm.nih.gov/)!
### Headers
Sometimes it's useful to have different levels of headings to structure your documents.
Start lines with a `#` to create headings. Multiple `##` in a row denote smaller heading sizes.
You can use one `#` all the way up to `######` six for different heading sizes.
If you'd like to include a quote, use the > character before the line:
> My Software never has bugs. It just develops random features.
### Lists
Sometimes you need numbered lists (here to some useful resources for markdown):
1. [Markdown Basics from R-Studio](http://rmarkdown.rstudio.com/authoring_basics.html)
1. [Mastering Markdown from GitHub](https://guides.github.com/features/mastering-markdown/) (this is where most of the examples above come from)
And sometimes you want bullet points (the kind of things you can do with R markdown
if you want to go beyond the basics):
- [Lots of options for embedded R code](http://rmarkdown.rstudio.com/authoring_rcodechunks.html) (more details below)
- [Bibliographies and References](http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html)
- [Interactive Documents](http://rmarkdown.rstudio.com/authoring_shiny.html)
- And if you have sub points, put two spaces before:
- Like this
- And this
### Equations
Equation support can be very handy if you need to provide some formulas in your
text, just use $\LaTeX$ [math](https://en.wikibooks.org/wiki/LaTeX/Mathematics): $x=\sum\beta\frac{\pi^2}{\gamma_i}$
Or more complicated large ones:
$$
f(n) =
\begin{cases}
n/2 & \quad \text{if } n \text{ is even}\\
-(n+1)/2 & \quad \text{if } n \text{ is odd}\\
\end{cases}
$$
### Images
If you want to embed images, this is how you do it:
![Pluto loves you](http://i.space.com/images/i/000/048/999/i02/pluto-new-horizons-july-2015.jpg?1437582878)
And now time for a horizontal break and off to R!
------
## R markdown
**R markdown** is a version of Markdown that is expanded to support running R code
in between your text. The blocks of R code are called `chunks` and you can treat
them as individual little segments of code, you can jump back and forth between them,
run just individual ones or run all of them when you click the **Knit** button - this
will generate a document that includes both content as well as the output of any
embedded R code chunks within the document. This is an R code chunk:
```{r my-first-chunk}
data <- cars # get the cars data set as an example
summary(data) # show a summary of the data set
```
You can also print out your data in table format if you want to include it in
your document:
```{r, results="asis"}
library(knitr)
kable(head(data))
```
Or you can print out the value of a variable in your text, say the value of $\pi$
with 4 significant digits: `r signif(pi, 4)` or the number of data points in
your data set: `r nrow(data)`.
And of course you can embed plots, for example:
```{r my-plot, echo=FALSE, fig.width=10}
plot(data)
```
For additional information on R and R markdown, there are lots of great resources
on the internet and the R user community is very active and extremely helpful. Often,
googling what you'd like to achieve will provide a good starting point but I can
also recommend the following resources specifically:
- [R reference manual](http://cran.r-project.org/doc/contrib/Short-refcard.pdf) (a great overview of many useful R commands)
- [Regression analysis functions](http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.pdf) (statistical analysis is one of the great strengths of R, this is a handy overview of useful functionality)
- [Stack Overflow](http://stackoverflow.com/) (a Q&A site for programming, searching for answers here often provides very helpful information)
With that, time to jump to the last item
------
## Data Frame Viewer
Note upfront: the approach taken in this user interface is not quite up to date with current easiest practices (i.e. more recently developed R packages make things even easier) so if you're already familiar with some basics of coding and generating plots, I recommend jumping straight to the accompanying *analysis.Rmd* file and working through it instead. However, if you'd like to start just by exploring some plotting features without any R or coding background, this is still a great way to get started.
The [Data Frame Viewer](https://github.com/sebkopf/dfv#dfv) is a custom R package that provides a simple user interface to facilitate getting started with using R for data processing. The GUI illustrates how to import data from Excel, melt data frames into plottable format, add additional information to the data and plot it using ggplot. Provides an easy system to keep track of multiple plots and save them in PDF format. Always shows the actual code that is executed to process or plot the data so users can experiment with changing the code directly and copy it to make their own data processing pipeline independent of this GUI.
The user interface is generated using [GTK+](http://www.gtk.org/), a cross-platform toolkit for graphical user interfaces. If GTK is not installed yet, please follow this [link](https://gist.github.com/sebkopf/9405675) for information on installing R with GTK+.
### Install dfv package
The **devtools** package provides a super convenient way of installing the **dfv** package directly from GitHub. To install **devtools**, run the following from the R command line:
```
install.packages('devtools', depen=T) # development tools
```
Then simply install the latest version of the Data Frame Viewer directly from GitHub by running the following code (if it is the first time you install the **dfv** package, all missing dependencies will be automatically installed as well -> **ggplot2, plyr, psych, scales, grid, gWidgets, RGtk2**, and **xlsx** as well as their respective dependencies, which might take a few minutes):
```
library(devtools)
install_github("sebkopf/dfv")
```
For additional information and troubleshooting help, see the [online help](https://github.com/sebkopf/dfv#dfv).
### Run dfv
Once installed, you can now run the Data Frame Viewer simply by typing:
```
library(dfv)
dfv.start()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment