Skip to content

Instantly share code, notes, and snippets.

@wolfganghuber
Last active July 17, 2018 23:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wolfganghuber/909e14e45af6888eec384b82682b3766 to your computer and use it in GitHub Desktop.
Save wolfganghuber/909e14e45af6888eec384b82682b3766 to your computer and use it in GitHub Desktop.
Demo: speeding up R code with Renjin. For a rendered version, see http://rpubs.com/WolfgangHuber/404211
---
title: "Speeding up 'loopy' R code"
author: "Wolfgang Huber"
output: BiocStyle::html_document
---
```{r global_options, include = FALSE}
```
# Versions of this document
- Rmarkdown source: https://gist.github.com/wolfganghuber/909e14e45af6888eec384b82682b3766
- Rendered HTML: http://rpubs.com/WolfgangHuber/404211
# Purpose of this gist
To explore the speeding up of R code with the JVM-based implementation of (a subset of) R by [the renjin project](http://www.renjin.org), and to compare it with other options, including R's bytecode compiler, Rcpp, and vectorization.
Renjin in general, and the R package that contains renjin are described at http://docs.renjin.org/en/latest/package. It can be installed via the instructions given there (it is currently not on CRAN). Note that it requires a working instance of the R package `rJava` installed.
```{r renjin, results = "hide"}
library("renjin")
```
# Computing pi via the Leibniz formula
We use this as an example of an algebra-heavy computation that involves a lengthy loop.
We approximate the below infinite sum by one with a large value of `m`.
\begin{equation}
\frac{\pi}{4} = \lim_{m\to\infty}\sum_{n=0}^{m} \frac{(-1)^n}{2n+1}
\end{equation}
```{r oldJIT1, echo = FALSE, results = "hide"}
oldJIT <- compiler::enableJIT(0)
```
```{r compute_pi}
compute_pi <- function(m) {
s = 0
sign = 1
for (n in 0:m) {
s = s + sign / (2 * n + 1)
sign = -sign
}
4 * s
}
```
## R's built-in byte code compiler
```{r}
compute_pi_bcc <- compiler::cmpfun(compute_pi)
```
## Vectorization
Vectorization is often a very appropriate way to get rid of explicit loops in R, and thus to speed up code, and even make it more readable and more robust. However, vectorization also has two potential drawbacks:
- When starting from a loop-based version, it requires manual rewriting of code. In the example here, that is simple, even trivial, but for more complex calculations, e.g., if the computations in one loop iteration depend on those from previous iterations, this may not always be easy or possible. (An example for that would be an iterative optimization algorithm.)
- It requires the allocation of large vectors; in the example below, of the vector `n` as well as several temporary expressions that arise in the evaluation of the second line of the below function. A double precision float vector of length $5\times10^8$ requires 3.7 GB.
```{r vectorized}
compute_pi_vec_1 <- function(m) {
n <- 0:m
4 * sum((-1)^n / (2 * n + 1))
}
```
With a little bit more thinking, we can come up with a second approach, which avoids calculating `(-1)^n` and uses vectors that are half the full length.
```{r vectorized_2}
compute_pi_vec_2 <- function(m) {
n <- seq(0, (m - 1) / 2, by = 2)
den <- 2 * n + 1
4 * (sum(1 / den) - sum(1 / (den + 2)))
}
```
## Rcpp
Iterative algorithms can often be implemented in C / C++. However, one has to be able to program in multiple languages and the development process can be slow.
```{r compute_pi_cpp, cache = FALSE}
library("Rcpp")
compute_pi_cpp <- cppFunction("
double compute_pi_cpp(int m) {
double s = 0, sign = 1;
for (int n = 0; n <= m; ++n) {
s = s + sign / (2 * n + 1);
sign = -sign;
}
return 4*s;
}
")
```
# Benchmark
```{r benchmark}
pis <- rep(NA_real_, 6)
m <- 5e8
```
## Ordinary uncompiled R function
```{r b1, cache = TRUE}
system.time(pis[1] <- compute_pi(m))
```
<!-- Note that we **reduced `m` by a factor of 10** here to limit runtime. -->
## Compiled R function
```{r b2, cache = TRUE}
system.time(pis[2] <- compute_pi_bcc(m))
```
Indeed, R will already compile many functions by default, automatically, without your explicit calling of the compiler. This behavior depends on a global state setting of R that is accessed by the `enableJIT` function in the `compiler` package. For demonstration purposes, we have above disabled the automatic compilation.
## renjin
```{r b3}
system.time(pis[3] <- renjin(compute_pi(m)))
```
About half a second of the above timing is spent on the compilation of the code (rather than its execution) and would be saved when called again.
## Vectorized
```{r b4and5, cache = TRUE}
system.time(pis[4] <- compute_pi_vec_1(m))
system.time(pis[5] <- compute_pi_vec_2(m))
```
## Rcpp
```{r b6}
system.time(pis[6] <- compute_pi_cpp(m))
```
## Check that the results are the same
```{r check}
print(pis)
max(abs(pis[-length(pis)] - pis[length(pis)]))
```
```{r oldJIT2, echo = FALSE, results = "hide"}
compiler::enableJIT(oldJIT)
```
# Exercise
Do something similar as above, but instead of computing $\pi$, compute the Mandelbrot set:
https://en.wikipedia.org/wiki/Mandelbrot_set
# Disclaimer
Such kinds of speed optimization are a last resort. First, make sure the code addresses the right problem and computes the correct solution. In comparison, this is more important than being fast. Using mathematical reasoning, symmetries, good approximations, the right algorithm and data structures is preferable to mindless optimization of clumsy code.
# Acknowledgements
Parts of the development of renjin was funded by the European Commission through the Horizon 2020 project SOUND: http://www.sound-biomed.eu
# Session Info
```{r session}
devtools::session_info()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment