Skip to content

Instantly share code, notes, and snippets.

Created October 9, 2014 01:26
Show Gist options
  • Save jtyberg/4e2b08434f34ee2bea60 to your computer and use it in GitHub Desktop.
Save jtyberg/4e2b08434f34ee2bea60 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
"metadata": {
"name": "",
"signature": "sha256:b4c0cf87b1fb34e620fb64741ce396cc0bc635b55249eea30c776eb06badccc4"
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
"cells": [
"cell_type": "markdown",
"metadata": {},
"source": [
"# Invoking R from IPython Notebook\n",
"This notebook shows how to use R from IPython. It demonstrates how to run R scripts, as well as how to invoke R code interactively using **`rpy2`** and [IPython magic integration](\n",
"* Invoke R commands using IPython cell magics\n",
"* Invoke R directly using `rpy2` and R magics\n",
"* Install R packages\n",
"* Interactive analysis in R\n",
"* Pull R objects into Python\n",
"* Download as IPython notebook\n",
"To use this notebook, you must have R installed (we used [these instructions]( for Ubuntu)."
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hello, World!\n",
"Using the IPython `%%bash` cell magic, we can run any command that we might run in a bash shell. For example, we can create a simple, \"Hello, World!\" R script, and `cat` the result."
"cell_type": "code",
"collapsed": false,
"input": [
"echo 'print ( \"Hello, World!\" )' > hello.R\n",
"cat hello.R"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also execute our new script using the [`Rscript`]( command line utility for R."
"cell_type": "code",
"collapsed": false,
"input": [
"Rscript hello.R"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"## Invoke R Directly using `rpy2` and IPython R magics\n",
"Invoking R from a bash shell is great, but many people use R in an interactive fashion (R is often used in a read-eval-print (REPL) loop. Enter [`rpy2`](, a Python package that provides interfaces to facilitate invoking R code from Python. \n",
"To install `rpy2`, we use `pip`, the Python package manager."
"cell_type": "code",
"collapsed": false,
"input": [
"pip install rpy2"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [
"pip freeze | grep rpy2"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"`rpy2` provides magics that allow us to **invoke R code directly** from within a cell, similar to the way we used `%%bash` magics to execute shell commands above. To use R magics from within a notebook, you need to load the `rpy2.ipython` extension."
"cell_type": "code",
"collapsed": false,
"input": [
"%load_ext rpy2.ipython"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"### `%R` Line Magic"
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can invoke R commands. Using the in-line magic (`%R`), we can even store the result."
"cell_type": "code",
"collapsed": false,
"input": [
"X_mean = %R X=c(1,3,5,7,9); mean(X)\n",
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also pass objects back and forth between Python and R. Use the `-i` flag to specify input to R:"
"cell_type": "code",
"collapsed": false,
"input": [
"X = [2,4,6,8,10]\n",
"X_median = %R -i X median(X)\n",
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"and the `-o` flag to specify a Python variable in which to store output:"
"cell_type": "code",
"collapsed": false,
"input": [
"%R -o X_squared X_squared=X*X\n",
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"### `%%R` Cell Magic"
"cell_type": "markdown",
"metadata": {},
"source": [
"The `%%R` cell magic allows us to run a block of R code, the output of which is published to the output of the cell:"
"cell_type": "code",
"collapsed": false,
"input": [
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"Output of plots is also supported ([example source]("
"cell_type": "code",
"collapsed": false,
"input": [
"plot(wt, mpg, main=\"Scatterplot Example\", \n",
" \txlab=\"Car Weight \", ylab=\"Miles Per Gallon \", pch=19)\n",
"abline(lm(mpg~wt), col=\"red\") # regression line (y~x) \n",
"lines(lowess(wt,mpg), col=\"blue\") # lowess line (x,y)"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install R Packages\n",
"R installs some packages by default, but oftentimes, we want to install others. For example, suppose we wanted to [find frequent sequences of items within a set]( To do this, we can leverage the **`arules`** and **`arulesSequence`** packages."
"cell_type": "code",
"collapsed": false,
"input": [
"install.packages('arules', repos=\"\")\n",
"install.packages('arulesSequences', repos=\"\")"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"The `arulesSequences` package comes with the `zaki.txt` sample data (named after the SPADE creator), which is located in the package's `misc` directory."
"cell_type": "code",
"collapsed": false,
"input": [
"cat /home/notebook/R/library/arulesSequences/misc/zaki.txt"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"## Interactive Analysis in R"
"cell_type": "markdown",
"metadata": {},
"source": [
"We can load the packages and mine the sample data for frequent sequences of items."
"cell_type": "code",
"collapsed": false,
"input": [
"# load the data set into a data frame\n",
"x <- read_baskets(con = system.file(\"misc\", \"zaki.txt\", package = \"arulesSequences\"), info = c(\"sequenceID\",\"eventID\",\"SIZE\"))\n",
"as(x, \"data.frame\")\n",
"# run the CSPADE algorithm to mine frequent items\n",
"s1 <- cspade(x, parameter = list(support = 0.4), control = list(verbose = TRUE))"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we can access R session state across multiple notebook cells. Here we print a summary of the results of the analysis performed in the previous cell. "
"cell_type": "code",
"collapsed": false,
"input": [
"# output the results\n",
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"## `%RPull` the results\n",
"IPython users may be more accustomed to manipulating data in Python. `rpy2` makes it easy to convert data between R objects and Python objects. For example, we may want to pull the results of our R analysis into a [Pandas]( DataFrame for further analysis. `rpy2` makes this trivial."
"cell_type": "code",
"collapsed": false,
"input": [
"df <- as(s1, \"data.frame\")\n",
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `%RPull`, the R `data.frame` is automatically converted to a Pandas `DataFrame`."
"cell_type": "code",
"collapsed": false,
"input": [
"%Rpull df\n",
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can manipulate the pandas DataFrame."
"cell_type": "code",
"collapsed": false,
"input": [
"%matplotlib inline\n",
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "code",
"collapsed": false,
"input": [
"df[ > 0.5].sort('support', ascending=False)"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"## Debugging\n",
"Did something go wrong? If so, try one of these should R throw an error."
"cell_type": "code",
"collapsed": false,
"input": [
"%R warnings()\n",
"%R traceback()"
"language": "python",
"metadata": {},
"outputs": []
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"M. J. Zaki. (2001). SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42, 31--60.\n",
"metadata": {}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment