Skip to content

Instantly share code, notes, and snippets.

@ateucher
Last active February 26, 2016 00:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ateucher/97653872c6fdfed5bf58 to your computer and use it in GitHub Desktop.
Save ateucher/97653872c6fdfed5bf58 to your computer and use it in GitHub Desktop.
Motivational Pitch for R

I'm going to start off by describing a pretty common data analysis scenario, and then talk about how using R can help:

  • You have a lot of individual spreadsheet files containing your data, and you need it all together, so you copy and paste each one into a master file.
  • Next you do a bunch of data cleaning in the master spreadsheet - fixing date formats, unit conversions, transformations, etc.
  • You then import the data into your favourite statistics program, run your analysis, and
  • copy the outputs back into a spreadsheet or other graphing program to plot your results.
  • You give the results to a colleague to review and she comes back with some concerns that something doesn't look quite right with the results. She also suggests that a different modelling technique would be more appropriate.
  • You comb through the original data and realize that in some of the files one column was misaligned, and so in copying and pasting these into the master dataset this error was compounded over many rows.
  • In addition, the suggested modelling procedure requires you to perform a different set of pre-processing steps on your data before you can do the analysis.
  • So you go in and manually fix the errors, redo the data processing, and complete the analysis.
  • Aside from the obvious ineffieciencies, what would have happened if you discovered the error a year later? Would you remember all the steps you had done?

So, what can R give you?

Efficiency:

The entire workflow can be done efficiently in R, including:

  • reading in raw data and combining the data from all of those files,
  • preparing the data for analysis,
  • performing the analysis, and
  • creating publication-quality graphs and figures.

Reliability

Using R can help you:

  • Minimize errors: point and click and copy and paste operations are very error-prone, and it's hard to catch those errors
  • And you can find and correct errors more easily: Changing a bit of code and re-running your script is much more efficient than doing it manually (without making another error).

Reproducibility

Reproducibilty means:

  • You know what you did: Doing your data preparation and analysis by coding it in R necessarily documents every step. The analysis and documentation are inextricably intertwined. (That said, we will teach you ways to explicitly document your code).
  • Others know what you did: They can inspect and re-run your code to understand what you did, verify you've done it correctly, and build on your work.

Versatility

R has a huge library of packages for performing diverse tasks, including:

  • data manipulation,
  • data visualization, and
  • almost every statistical method you can think of

I think one of the best reasons to use R is the:

Fantastic online R Community

  • R is hugely popular right now.
  • If you have a question or are having a problem, 99% of the time you'll find that it has already been asked and answered online. And if it hasn't, people are almost always willing to help out.

Finally, R is free, open-source, and available for all computer platforms.

I hope this has helped convince you that learning R will be worth your while. So let's get started!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment