Skip to content

Instantly share code, notes, and snippets.

@walkerjeffd
Last active August 29, 2015 14:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save walkerjeffd/c201990fcd6169e421db to your computer and use it in GitHub Desktop.
Save walkerjeffd/c201990fcd6169e421db to your computer and use it in GitHub Desktop.
R Programming for Environmental Engineers

R Programming for Environmental Engineers

Jeff Walker, PhD

This document provides some suggestions and resources for environmental engineers using R. This information is based on my own personal experience of learning and using R.

Software

First, you will need the core R program which can be downloaded from the Comprehensive R Archive Network (CRAN). CRAN is the official repository of the R program as well as the various packages you will be using.

Second, download and install R Studio Desktop, which runs on top of the core R program and provides many useful features. While you could use the R GUI program that comes with R, R Studio provides a much richer programming environmental that makes using R much more productive and enjoyable.

Getting Started

If you're brand new to R (and even programming in gneeral), you'll need to learn the basics about how R works, the types of variables (e.g. numeric, character, boolean, ...), the programming structures (e.g. conditional if/else statements, for loops, while loops, ...).

The best starting point is probably one of the many books on general R programming. My recommendation is The Art of R Programming by Norman Matloff. There appears to be a free version available here, although I'm not sure if its exactly the same as the book.

Another great starting point are the many free and online classes through Coursera. In particular, the Data Science Specialization is a sequence of courses that will effectively make you at least an intermediate, if not advanced, R programmer. These are all free and can be

Packages

The real power of R is the ecosystem of packages that have been created by countless developers. Packages are

If you need to install a package, you're best bet is to try and use the standard install.packages() function. Simple pass the name of the package as a string.

> install.packages('ggplot2')

Plotting with ggplot2

R comes with a plotting functions (e.g. plot(x, y)) that are a good place to start. However, the ggplot2 package is a very popular alternative that has little in common with the basic R plotting functions. I would strongly recommend learning ggplot2 sooner than later. I personally only use ggplot2 and very rarely use the basic plotting functions. Not only do ggplot2 graphics look much better, but they provide a difference kind of language for creating plots that is extremely powerful.

The best way to learn ggplot2 is to start with the book by the author of the package, Hadley Wickham.

Analyzing Data

Once you learned the basics of R and are ready to start analyzing some data, I suggest reading the following two papers by Hadley Wickham.

The tidy data paper focuses on ways of storing data. For example, one could have a dataset containing daily streamflows from a variety of stations. One way to store this data would be a so-called wide format where the first column has the date, and the remaining columns are the flows with one column for each station:

library(lubridate)
q <- data.frame(Date=c("2012-01-01", "2012-01-02", "2012-01-03"),
                StationA=c(123.3, 125.2, 128.6),
                StationB=c(13.2, 14.1, 16.6),
                StationC=c(1423.2, 1434.9, 1501.3))
print(q)
##         Date StationA StationB StationC
## 1 2012-01-01    123.3     13.2     1423
## 2 2012-01-02    125.2     14.1     1435
## 3 2012-01-03    128.6     16.6     1501

However, you could also store this in a long format where each row represents a single flow value. This would require adding a column the indicates the station.

library(reshape2)
q.long <- melt(q, id=c('Date'), measure=c('StationA', 'StationB', 'StationC'), 
               value.name='Flow', variable.name='Station')
print(q.long)
##         Date  Station   Flow
## 1 2012-01-01 StationA  123.3
## 2 2012-01-02 StationA  125.2
## 3 2012-01-03 StationA  128.6
## 4 2012-01-01 StationB   13.2
## 5 2012-01-02 StationB   14.1
## 6 2012-01-03 StationB   16.6
## 7 2012-01-01 StationC 1423.2
## 8 2012-01-02 StationC 1434.9
## 9 2012-01-03 StationC 1501.3

Getting Help

If you don't know how to do something or cannot get something to work correctly, chances are the solution is out there on the Internet already. I frequently search for problems on

If you can't find the solution, then you can certainly post a question to Stack Overflow or the R Mailing list. But before you do, read How To Ask Questions The Smart Way by Eric Steven Raymond. This will help you communicate your problem most effectively so that others can more easily help you figure it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment