Skip to content

Instantly share code, notes, and snippets.

@jtyberg
Created October 9, 2014 01:26
Show Gist options
  • Save jtyberg/4e2b08434f34ee2bea60 to your computer and use it in GitHub Desktop.
Save jtyberg/4e2b08434f34ee2bea60 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:b4c0cf87b1fb34e620fb64741ce396cc0bc635b55249eea30c776eb06badccc4"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Invoking R from IPython Notebook\n",
"\n",
"This notebook shows how to use R from IPython. It demonstrates how to run R scripts, as well as how to invoke R code interactively using **`rpy2`** and [IPython magic integration](http://rpy.sourceforge.net/rpy2/doc-2.4/html/interactive.html#module-rpy2.ipython.rmagic).\n",
"\n",
"* Invoke R commands using IPython cell magics\n",
"* Invoke R directly using `rpy2` and R magics\n",
"* Install R packages\n",
"* Interactive analysis in R\n",
"* Pull R objects into Python\n",
"* Download as IPython notebook\n",
"\n",
"To use this notebook, you must have R installed (we used [these instructions](http://cran.r-project.org/bin/linux/ubuntu/README) for Ubuntu)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hello, World!\n",
"\n",
"Using the IPython `%%bash` cell magic, we can run any command that we might run in a bash shell. For example, we can create a simple, \"Hello, World!\" R script, and `cat` the result."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"echo 'print ( \"Hello, World!\" )' > hello.R\n",
"cat hello.R"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also execute our new script using the [`Rscript`](http://stat.ethz.ch/R-manual/R-devel/library/utils/html/Rscript.html) command line utility for R."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"Rscript hello.R"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Invoke R Directly using `rpy2` and IPython R magics\n",
"\n",
"Invoking R from a bash shell is great, but many people use R in an interactive fashion (R is often used in a read-eval-print (REPL) loop. Enter [`rpy2`](http://rpy.sourceforge.net/rpy2/doc-2.4/html/index.html), a Python package that provides interfaces to facilitate invoking R code from Python. \n",
"\n",
"To install `rpy2`, we use `pip`, the Python package manager."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"pip install rpy2"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"pip freeze | grep rpy2"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`rpy2` provides magics that allow us to **invoke R code directly** from within a cell, similar to the way we used `%%bash` magics to execute shell commands above. To use R magics from within a notebook, you need to load the `rpy2.ipython` extension."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%load_ext rpy2.ipython"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `%R` Line Magic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can invoke R commands. Using the in-line magic (`%R`), we can even store the result."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"X_mean = %R X=c(1,3,5,7,9); mean(X)\n",
"X_mean"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also pass objects back and forth between Python and R. Use the `-i` flag to specify input to R:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"X = [2,4,6,8,10]\n",
"X_median = %R -i X median(X)\n",
"X_median"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and the `-o` flag to specify a Python variable in which to store output:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%R -o X_squared X_squared=X*X\n",
"X_squared"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `%%R` Cell Magic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `%%R` cell magic allows us to run a block of R code, the output of which is published to the output of the cell:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"X=c(1,3,5,7,9)\n",
"Y=c(2,4,6,8,10)\n",
"X*Y"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Output of plots is also supported ([example source](http://www.statmethods.net/graphs/scatterplot.html)):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"attach(mtcars)\n",
"plot(wt, mpg, main=\"Scatterplot Example\", \n",
" \txlab=\"Car Weight \", ylab=\"Miles Per Gallon \", pch=19)\n",
"abline(lm(mpg~wt), col=\"red\") # regression line (y~x) \n",
"lines(lowess(wt,mpg), col=\"blue\") # lowess line (x,y)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install R Packages\n",
"\n",
"R installs some packages by default, but oftentimes, we want to install others. For example, suppose we wanted to [find frequent sequences of items within a set](http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE). To do this, we can leverage the **`arules`** and **`arulesSequence`** packages."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"install.packages('arules', repos=\"http://watson.nci.nih.gov/cran_mirror/\")\n",
"install.packages('arulesSequences', repos=\"http://watson.nci.nih.gov/cran_mirror/\")"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"packageDescription(\"arulesSequences\")"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `arulesSequences` package comes with the `zaki.txt` sample data (named after the SPADE creator), which is located in the package's `misc` directory."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%bash\n",
"cat /home/notebook/R/library/arulesSequences/misc/zaki.txt"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Interactive Analysis in R"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can load the packages and mine the sample data for frequent sequences of items."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"library(Matrix)\n",
"library(arules)\n",
"library(arulesSequences)\n",
"\n",
"# load the data set into a data frame\n",
"x <- read_baskets(con = system.file(\"misc\", \"zaki.txt\", package = \"arulesSequences\"), info = c(\"sequenceID\",\"eventID\",\"SIZE\"))\n",
"as(x, \"data.frame\")\n",
"\n",
"# run the CSPADE algorithm to mine frequent items\n",
"s1 <- cspade(x, parameter = list(support = 0.4), control = list(verbose = TRUE))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we can access R session state across multiple notebook cells. Here we print a summary of the results of the analysis performed in the previous cell. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"# output the results\n",
"summary(s1)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `%RPull` the results\n",
"\n",
"IPython users may be more accustomed to manipulating data in Python. `rpy2` makes it easy to convert data between R objects and Python objects. For example, we may want to pull the results of our R analysis into a [Pandas](http://pandas.pydata.org/) DataFrame for further analysis. `rpy2` makes this trivial."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R\n",
"df <- as(s1, \"data.frame\")\n",
"df"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `%RPull`, the R `data.frame` is automatically converted to a Pandas `DataFrame`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%Rpull df\n",
"type(df)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can manipulate the pandas DataFrame."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%matplotlib inline\n",
"print df.support.describe()\n",
"df.support.hist()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[df.support > 0.5].sort('support', ascending=False)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Debugging\n",
"\n",
"Did something go wrong? If so, try one of these should R throw an error."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%R warnings()\n",
"%R traceback()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"M. J. Zaki. (2001). SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42, 31--60.\n",
"[paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.113.6042&rep=rep1&type=pdf)"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment