Skip to content

Instantly share code, notes, and snippets.

@hadley
Created January 3, 2014 15:16
Show Gist options
  • Save hadley/8239504 to your computer and use it in GitHub Desktop.
Save hadley/8239504 to your computer and use it in GitHub Desktop.

How to make your R code faster

Making your code faster is requires balancing short-term and long-term goals. If you only spend time making your existing code faster, you'll never get any new work done, but if you don't spend any time thinking about inefficient code you'll waste time computing that you could have reduced with a little more knowledge. Below I make three broad recommendations on how to make your R code faster, based on whether you want a pay off in the short, medium or long term.

In the short term, you'll get the most bang for your buck by learning how to write better R code. Patrick Burns' The R inferno is a great place to learn about the most common performance mistakes and how to avoid them. I also like Norm Matloff's The Art of R programming, and I have my own hat in the ring with Advanced R programming. Apart from books, asking and answering questions on stackoverflow is a great learning technique, and there's lots to learn on R bloggers.

In the medium term, I think the biggest payoff comes from learning Rcpp. Yes, you have to learn a little C++, but there a lots of great resources available, and I think you can pick up the basics you need in under a week. I think learning Rcpp has a very high payoff in the medium term because it has such tight integration with the R ecosystem: you can supplement what you already know about R with some very fast new code. It's easy to use Rcpp in a package (173 packages already do), Rstudio supports it (so you can compile your first Rcpp function in two clicks) and there's plenty of help available from the mailing list.

In the long term, your best bet is to learn another programming language. Even if you don't end up using it a lot, it's still valuable to be exposed to new ideas and new ways of tackling problems. It will help you to become a better programmer and illuminate the strengths and weaknesses of R. Don't spend hours agonising over which one to learn; instead spend your time mastering one. Use this code to choose:

langs <- c("clojure", "julia", "python", "scala")
sample(langs, 1)

I suggest these four languages because they all have burgeoning data science commmunities, and their strengths complement R's weaknesses:

  • Clojure is a Lisp dialect built on the JVM. It provides a very clean syntax, and great built-in support for parallel programming. Check out incanter for basic statistics support.

  • Julia is an up-and-coming language designed for scientific computing. It aims to be almost as fast as C, but with much higher-level language.

  • Python is an excellent general purpose programming language. It's not hugely fast, but like R, lots of people have built addons with C to provide better performance. Look at pandas for data manipulation, scikit-learn for modelling and bokeh for plotting.

  • Scala is another language built on top of the JVM - it has great performance, implements a huge swath of programming paradigms, and has a growing data science community.

Many of these languages are still missing what I consider to be key tools for data analysis, but that's an opportunity for you! There's no better way to learn how a key part of your data science workflow than to implement it in another language. (Of these languages Python is probably the most mature, but you're also less likely to learn anything fundamentally new about programming).

If you want to look further a field, you might want to try Haskell, F# or J. These languages are more exotic and less likely to be useful for day-to-day data analysis, but will introduce you to really interesting and powerful ways to think about computing with data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment