Skip to content

Instantly share code, notes, and snippets.

@ramnathv
Last active April 8, 2022 11:17
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ramnathv/6615277 to your computer and use it in GitHub Desktop.
Save ramnathv/6615277 to your computer and use it in GitHub Desktop.
Parser to Convert Markdown to CTV file.

This is a proof-of-concept of a parser to convert a markdown file to a ctv file, that is required for task views. The basic idea is to use a combination of YAML and markdown sections in the md file, parse it and convert it into a payload, and render it using a template as specified by the ctv package.

Structure of Markdown File

---
name: Working with data on the web
maintainer: Scott Chamberlain, Karthik Ram, Christopher Gandrud
email: scott at ropensci.org
version:  2013-09-17
---

<!- information in markdown ->

---

<!- links ->
  • Ultimately, it needs to pass check_ctv_packages('test.ctv'), but I am running into some errors connected to conversion of markdown to HTML.
  • Packages are assumed to be marked up as "[pkgname][pkgname]". But, I realize that this is just a markup making use of Pandoc's markdown. So we need an explicit markup for packages so that the add_pkg_markup function would identify packages correctly.
md2ctv <- function(mdfile){
require(stringr); require(yaml); require(markdown); require(whisker)
# read file
read_file = function(file_){
paste(readLines(file_), collapse = '\n')
}
# add package markup to text of form [pkg][pkg]
add_pkg_markup <- function(x){
require(stringr)
str_replace_all(x, '\\[([[:alpha:]]+)]\\[([[:alpha:]]+)]', perl('<pkg>\\1</pkg>'))
}
# read markdown file and extract fields, converting md to html
md <- read_file(mdfile)
fields <- str_split(md, '---')[[1]][-1]
payload <- c(yaml.load(fields[1]),
info = add_pkg_markup(renderMarkdown(text = fields[2])),
links = renderMarkdown(text = fields[2])
)
# extract package list
payload$packagelist = str_match_all(
payload$info, "<pkg>([^<]*)</pkg>"
)[[1]][,2]
# read template and write ctv file
template = read_file('template.xml')
out = paste(capture.output(cat(
whisker.render(template, data = list(payload = payload)))
), collapse = "\n")
out = add_pkg_markup(out)
writeLines(out, 'test.ctv')
}
{{# payload }}
<CRANTaskView>
<name>{{ name }}</name>
<topic>{{ topic }}</topic>
<maintainer email="{{ email }}">Achim Zeileis</maintainer>
<version>{{ version }}</version>
<info>
{{{ info }}}
</info>
<packagelist>
{{# packagelist }}
<pkg>{{ . }}</pkg>
{{/ packagelist }}
</packagelist>
<links>
{{{ links }}}
</links>
</CRANTaskView>
{{/ payload }}
<CRANTaskView>
<name>Working with data on the web</name>
<topic></topic>
<maintainer email="scott at ropensci.org">Achim Zeileis</maintainer>
<version>2013-09-17</version>
<info>
<h2>Introduction</h2>
<p>This Task View contains information about using R to obtain and parse data from the web.</p>
<p>The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web.</p>
<p>If you have any comments or suggestions for additions or improvements for this taskview, go to Github and <a href="https://github.com/ropensci/webservices/issues">submit an issue</a> or make some changes and <a href="https://github.com/ropensci/webservices/pulls">submit a pull request</a>. If you have an issue with one of the packages, please contact the maintainer of the package.</p>
<p>A list of available packages and functions is presented below, grouped by the type of activity.</p>
<h2>Tools for working with the web from R</h2>
<h3>curl/http/ftp</h3>
<ul>
<li><pkg>RCurl</pkg>: a low level curl wrapper for R. </li>
<li><pkg>httr</pkg>: a light wrapper around RCurl that makes many things easier, but still allows you to access the lower level functionality of RCurl. </li>
</ul>
<p>httr has convenient http verbs: <code>GET()</code>, <code>POST()</code>, <code>PUT()</code>, <code>DELETE()</code>, <code>PATCH()</code>, <code>HEAD()</code>, <code>BROWSE()</code>. These wrap functions in RCurl, making them more convenient to use, though less configurable than counterparts in RCurl. Though note that you can pass in additional Curl options to the <code>config</code> parameter in http calls. The equivalent of httr&#39;s <code>GET()</code> in RCurl is <code>getForm()</code>. Likewise, the equivalent of httr&#39;s <code>POST()</code> in RCurl is <code>postForm()</code>. </p>
<p><a href="http://en.wikipedia.org/wiki/Http_status_codes">http status codes</a> are helpful for debugging http calls. httr package makes this easier using, for example, <code>stop_for_status()</code> gets the http status code from a response object, and stops the function if the call was not successful. See also <code>warn_for_status()</code>.</p>
<h3>Authentication</h3>
<p>Using web resources can require authentication, either via API keys, OAuth, username:password combination, or via other means. Additionally, sometimes web resources that require authentication be in the header of an http call, which requires a little bit of extra work. API keys and username:password combos can be combined within a url for a call to a web resource (api key: <a href="http://api.foo.org/?key=yourkey">http://api.foo.org/?key=yourkey</a>; user/pass: http://username:<a href="mailto:password@api.foo.org">password@api.foo.org</a>), or can be specified via commands in RCurl or httr. OAuth is the most complicated authentication process, and can be most easily done using httr. See the 6 demos within httr, three for OAuth 1.0 (linkedin, twitter, vimeo) and three for OAuth 2.0 (facebook, github, google). <pkg>ROAuth</pkg> is a package that provides a separate R interface to OAuth. OAuth is easier to to do in httr, so start there. </p>
<h3>Web frameworks</h3>
<p>RStudio recently created <pkg>Shiny</pkg>, which combines R, html, css, and javascript to make web applications. Related tools are available, including [openCPU]<a href="%5Bon%20CRAN%5D%5Bopencpucran%5D">opencpu</a> and <pkg>Rook</pkg>. However, Shiny is the most promising of these.</p>
<h3>Parsing data from the web</h3>
<ul>
<li>txt, csv, etc.: you can use <code>read.csv()</code> after acquiring the csv file from the web via e.g., <code>getURL()</code> from RCurl. <code>read.csv()</code> works with http but not https, i.e.: read.csv(&ldquo;http://&hellip;&rdquo;), but not read.csv(&ldquo;https://&hellip;&rdquo;). The <pkg>repmis</pkg> package contains a <code>source_data()</code> command to simplify this process, while also assigning SHA-1 hashes to uniquely identify file versions.</li>
<li>xml/html: the package <pkg>XML</pkg> by Duncan Temple-Lang contains functions for parsing xml and html, and supports <pkg>xpath</pkg> for searching xml (think regex for strings). <pkg>scrapeR</pkg> provides additional tools for scraping data from html and xml documents.</li>
<li>json/json-ld: <pkg>RJSONIO</pkg> by Duncan Temple-Lang. Another package, <pkg>rjson</pkg>, does many of the same tasks which RJSONIO does.</li>
<li>custom formats: Some web APIs provide custom data formats (e.g., X), which are usually modified xml or json, and handled by XML and RJSONIO, respectively.</li>
<li>An alternative to the XML package is <pkg>selectr</pkg>, which parses CSS3 Selectors and translates them to XPath 1.0 expressions. XML package is often used for xml and html, but selectr translates CSS selectors to XPath, so can use the CSS selectors instead of XPath. The <a href="http://selectorgadget.com/">selectorgadget browser extension</a> can be used to identify page elements. </li>
</ul>
<h3>Javascript</h3>
<p>Javascript provides many libraries to make interactive visualizations for the browser, either locally or on the web. An increasing number of R packages are providing the ability to make visualizations using various javascript libraries. Some of them include:</p>
<ul>
<li><a href="https://github.com/rstudio/ggvis">ggvis</a> ggvis makes it easy to describe interactive web graphics in R. It fuses the ideas of ggplot2 and shiny, rendering graphics on the web with vega.</li>
<li><a href="https://github.com/ramnathv/rCharts">rCharts</a> Interactive javascript charts from R (not on CRAN)</li>
<li><a href="https://github.com/metagraf/rVega">rVega</a> An R wrapper for Vega (not on CRAN)</li>
<li><a href="https://github.com/nachocab/clickme">clickme</a> An R package to create interactive plots (not on CRAN)</li>
</ul>
<h2>Data sources on the web available via R</h2>
<h3>Ecological and evolutionary biology</h3>
<ul>
<li><pkg>rvertnet</pkg>: A wrapper to the VertNet collections database API.</li>
<li><pkg>rgbif</pkg>: Interface to the Global Biodiversity Information Facility API methods</li>
<li><pkg>rfishbase</pkg>: A programmatic interface to fishbase.org.</li>
<li><pkg>rtreebase</pkg>: An R package for discovery, access and manipulation of online phylogenies</li>
<li><pkg>taxize</pkg>: Taxonomic information from around the web</li>
<li><pkg>dismo</pkg>: Species distribution modeling, with wrappers to some APIs. <a href="http://cran.r-project.org/web/packages/dismo/vignettes/brt.pdf">vignette</a></li>
<li><pkg>rnbn</pkg>: Access to the UK National Biodiversity Network data (not on CRAN)</li>
<li><pkg>rWBclimate</pkg>: R interface for the World Bank climate data (not on CRAN)</li>
<li><pkg>rbison</pkg>: Wrapper to the USGS Bison API (not on CRAN)</li>
<li><pkg>neotoma</pkg>: Programmatic R interface to the Neotoma Paleoecological Database (not on CRAN)</li>
<li><pkg>rnoaa</pkg>: R interface to NOAA Climate data API (not on CRAN)</li>
<li><pkg>rnpn</pkg>: Wrapper to the National Phenology Network database API</li>
<li><pkg>rfisheries</pkg>: Package for interacting with fisheries databases at openfisheries.org <a href="http://openfisheries.org/">more</a></li>
<li><pkg>rebird</pkg>: A programmatic interface to the eBird database</li>
<li><pkg>flora</pkg>: Retrieve taxonomical information of botanical names from the Flora do Brasil website</li>
<li><pkg>Rcolombos</pkg>: This package provides programmatic access to Colombos, a web based interface for exploring and analyzing comprehensive organism-specific cross-platform expression compendia of bacterial organisms.</li>
<li><pkg>Reol</pkg>: An R interface to the EOL API. Includes functions for downloading and extracting information off the EOL pages.</li>
<li><pkg>rPlant</pkg>: rPlant is an R interface to the the many computational resources iPlant offers through their RESTful application programming interface. Currently, rPlant functions interact with the iPlant foundational API, the Taxonomic Name Resolution Service API, and the Phylotastic Taxosaurus API. Before using rPlant, users will have to register with the iPlant Collaborative. <a href="http://www.iplantcollaborative.org/discover/discovery-environment">http://www.iplantcollaborative.org/discover/discovery-environment</a></li>
</ul>
<h3>Genes/genomes</h3>
<ul>
<li><pkg>cgdsr</pkg>: R-Based API for accessing the MSKCC Cancer Genomics Data Server (CGDS). <a href="http://www.cbioportal.org/public-portal">more</a></li>
<li><pkg>rsnps</pkg>: Wrapper to the openSNP data API and the Broad Institute SNP Annotation and Proxy Search. </li>
<li><pkg>rentrez</pkg>: Talk with NCBI entrez using R</li>
</ul>
<h3>Earth Science</h3>
<ul>
<li><pkg>RNCEP</pkg>: Global weather and climate data at your fingertips. <a href="https://sites.google.com/site/michaelukemp/rncep">more</a></li>
<li><pkg>crn</pkg>: The crn package provides the core functions required to download and format data from the Climate Reference Network. Both daily and hourly data are downloaded from the ftp, a consolidated file of all stations is created, station metadata is extracted. In addition functions for selecting individual variables and creating R friendly datasets for them is provided. <a href="http://stevemosher.wordpress.com/">more</a></li>
<li><pkg>BerkeleyEarth</pkg>: Data Input for Berkeley Earth Surface Temperature. <a href="http://stevemosher.wordpress.com/">more</a></li>
<li><pkg>waterData</pkg>: An R Package for Retrieval, Analysis, and Anomaly Calculation of Daily Hydrologic Time Series Data. <a href="http://pubs.usgs.gov/of/2012/1168/">more</a>, <a href="http://cran.r-project.org/web/packages/waterData/vignettes/vignette.pdf">vignette</a></li>
<li><pkg>CHCN</pkg>: A compilation of historical through contemporary climate measurements scraped from the Environment Canada Website Including tools for scraping data, creating metadata and formating temperature files.</li>
<li><pkg>decctools</pkg>: decctools provides functions for retrieving energy statistics from the United Kingdom Department of Energy and Climate Change and related data sources. The current version focuses on total final energy consumption statistics at the local authority, MSOA, and LSOA geographies. Methods for calculating the generation mix of grid electricity and its associated carbon intensity are also provided.</li>
<li><pkg>Metadata</pkg>: Collates Metadata for Climate Surface Stations</li>
<li>[sos4R][sos4R]: sos4R is a client for Sensor Observation Services (SOS) as specified by the Open Geospatial Consortium (OGC). It allows users to retrieve metadata from SOS web services and to interactively create requests for near real-time observation data based on the available sensors, phenomena, observations et cetera using thematic, temporal and spatial filtering.</li>
</ul>
<h3>Economics</h3>
<ul>
<li><pkg>WDI</pkg>: Search, extract and format data from the World Bank&#39;s World Development Indicators. <a href="https://sites.google.com/site/michaelukemp/rncep">more</a></li>
<li><pkg>FAOSTAT</pkg>: The package hosts a list of functions to download, manipulate, construct and aggregate agricultural statistics provided by the FAOSTAT databasthe Food and Agricultural Organization of the United Nations <a href="http://cran.r-project.org/web/packages/FAOSTAT/index.html">more</a>, <a href="http://cran.r-project.org/web/packages/FAOSTAT/vignettes/FAOSTAT.pdf">vignette</a></li>
</ul>
<h3>Chemistry</h3>
<ul>
<li><pkg>rpubchem</pkg>: Interface to the PubChem Collection.</li>
</ul>
<h3>Agriculture</h3>
<ul>
<li><pkg>cimis</pkg>: R package for retrieving data from CIMIS, the California Irrigation Management Information System.</li>
</ul>
<h3>Literature, metadata, text, and altmetrics</h3>
<ul>
<li><pkg>rplos</pkg>: A programmatic interface to the Web Service methods provided by the Public Library of Science journals for search.</li>
<li><pkg>rbhl</pkg>: R interface to the Biodiversity Heritage Library (BHL) API (not on CRAN)</li>
<li><pkg>rmetadata</pkg>: Get scholarly metadata from around the web (not on CRAN)</li>
<li><pkg>RMendeley</pkg>: Implementation of the Mendeley API in R</li>
<li><pkg>rentrez</pkg>: Talk with NCBI entrez using R</li>
<li><pkg>rorcid</pkg>: A programmatic interface the Orcid.org API (not on CRAN)</li>
<li><pkg>rpubmed</pkg>: Tools for extracting and processing Pubmed and Pubmed Central records (not on CRAN)</li>
<li><pkg>rAltmetic</pkg>: Query and visualize metrics from Altmetric.com (not on CRAN)</li>
<li><pkg>rImpactStory</pkg>: Programmatic interface to the ImpactStory API</li>
<li><pkg>alm</pkg>: R wrapper to the almetrics API platform developed by PLoS (not on CRAN)</li>
<li><pkg>ngramr</pkg>: Retrieve and plot word frequencies through time from the Google Ngram Viewer (<a href="https://github.com/seancarmody/ngramr">development vesion</a>)</li>
</ul>
<h3>Marketing</h3>
<ul>
<li><pkg>anametrix</pkg>: Bidirectional connector to Anametrix API</li>
</ul>
<h3>Data depots</h3>
<ul>
<li><pkg>rfigshare</pkg>: Programmatic interface for Figshare <a href="http://figshare.com/">more</a></li>
<li><pkg>factualR</pkg>: Thin wrapper for the Factual.com server API. <a href="http://www.exmachinatech.net/01/factualr/">more</a></li>
<li><pkg>dataone</pkg>: A package that provides read/write access to data and metadata from the DataONE network of Member Node data repositories. <a href="http://releases.dataone.org/online/dataone_r/">more</a></li>
<li><pkg>yhatr</pkg>: yhatr lets you deploy, maintain, and invoke models via the Yhat REST API.</li>
<li><pkg>RSocrata</pkg>: Provided with a Socrata dataset resource URL, or a Socrata SoDA web API query, returns an R data frame. Converts dates to POSIX format. Supports CSV and JSON. Manages throttling by Socrata.</li>
</ul>
<h3>Machine learning as a service (MLaaS anyone?)</h3>
<ul>
<li><pkg>bigml</pkg>: BigML, a machine learning web service <a href="https://bigml.com/">more</a></li>
<li><pkg>MTurkR</pkg>: Access to Amazon Mechanical Turk Requester API via R. <a href="http://thomasleeper.com/MTurkR/index.html">more</a></li>
</ul>
<h3>Web Analytics</h3>
<ul>
<li><pkg>rgauges</pkg>: Interface to Gaug.es API <a href="https://secure.gaug.es">more</a> (not on CRAN)</li>
<li><pkg>RSiteCatalyst</pkg>: Functions for accessing the Adobe Analytics (Omniture SiteCatalyst) Reporting API</li>
<li><pkg>RGoogleAnalytics</pkg>: Provides access to Google Analytics. <a href="http://www.tatvic.com/blog/ga-data-extraction-in-r/">tutorial</a></li>
</ul>
<h3>News</h3>
<ul>
<li><pkg>GuardianR</pkg>: Provides an interface to the Open Platform&#39;s Content API of the Guardian Media Group. It retrieves content from news outlets The Observer, The Guardian, and guardian.co.uk from 1999 to current day</li>
</ul>
<h3>Images/videos/music</h3>
<ul>
<li><pkg>imguR</pkg>: A package to share plots using the image hosting service imgur.com</li>
<li><pkg>RLastFM</pkg>: A package to interface to the last.fm API.</li>
</ul>
<h3>Sports</h3>
<ul>
<li><pkg>nhlscraper</pkg>: Compiling the NHL Real Time Scoring System Database for easy use in R</li>
</ul>
<h3>Maps</h3>
<ul>
<li><pkg>osmar</pkg>: This package provides infrastructure to access OpenStreetMap data from different sources, to work with the data in common R manner, and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).</li>
<li><pkg>ggmap</pkg>: ggmap allows for the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2.</li>
</ul>
<h3>Social media</h3>
<ul>
<li><pkg>streamR</pkg>: This package provides a series of functions that allow R users to access Twitter&#39;s filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported.</li>
<li><pkg>twitteR</pkg>: Provides an interface to the Twitter web API</li>
</ul>
<h3>Government</h3>
<ul>
<li><pkg>wethepeople</pkg>: An R client for interacting with the White House&#39;s &ldquo;We The People&rdquo; petition API</li>
<li><pkg>govdat</pkg>: Interface to various APIs for government data, including New York Times congress API, and the Sunlight Foundation set of APIs.</li>
</ul>
<h3>Other</h3>
<ul>
<li><pkg>dvn</pkg>: Provides access to The Dataverse Network API. <a href="http://thedata.org/">more</a></li>
<li>[sos4R][sos4R]: R client for the OGC Sensor Observation Service. <a href="http://www.nordholmen.net/sos4r">more</a></li>
<li><pkg>datamart</pkg>: Unified access to various data sources.</li>
<li><pkg>rDrop</pkg>: Dropbox interface.</li>
<li><pkg>zendeskR</pkg>: This package provides an R wrapper for the Zendesk API</li>
</ul>
</info>
<packagelist>
<pkg>RCurl</pkg>
<pkg>httr</pkg>
<pkg>ROAuth</pkg>
<pkg>Shiny</pkg>
<pkg>Rook</pkg>
<pkg>repmis</pkg>
<pkg>XML</pkg>
<pkg>xpath</pkg>
<pkg>scrapeR</pkg>
<pkg>RJSONIO</pkg>
<pkg>rjson</pkg>
<pkg>selectr</pkg>
<pkg>rvertnet</pkg>
<pkg>rgbif</pkg>
<pkg>rfishbase</pkg>
<pkg>rtreebase</pkg>
<pkg>taxize</pkg>
<pkg>dismo</pkg>
<pkg>rnbn</pkg>
<pkg>rWBclimate</pkg>
<pkg>rbison</pkg>
<pkg>neotoma</pkg>
<pkg>rnoaa</pkg>
<pkg>rnpn</pkg>
<pkg>rfisheries</pkg>
<pkg>rebird</pkg>
<pkg>flora</pkg>
<pkg>Rcolombos</pkg>
<pkg>Reol</pkg>
<pkg>rPlant</pkg>
<pkg>cgdsr</pkg>
<pkg>rsnps</pkg>
<pkg>rentrez</pkg>
<pkg>RNCEP</pkg>
<pkg>crn</pkg>
<pkg>BerkeleyEarth</pkg>
<pkg>waterData</pkg>
<pkg>CHCN</pkg>
<pkg>decctools</pkg>
<pkg>Metadata</pkg>
<pkg>WDI</pkg>
<pkg>FAOSTAT</pkg>
<pkg>rpubchem</pkg>
<pkg>cimis</pkg>
<pkg>rplos</pkg>
<pkg>rbhl</pkg>
<pkg>rmetadata</pkg>
<pkg>RMendeley</pkg>
<pkg>rentrez</pkg>
<pkg>rorcid</pkg>
<pkg>rpubmed</pkg>
<pkg>rAltmetic</pkg>
<pkg>rImpactStory</pkg>
<pkg>alm</pkg>
<pkg>ngramr</pkg>
<pkg>anametrix</pkg>
<pkg>rfigshare</pkg>
<pkg>factualR</pkg>
<pkg>dataone</pkg>
<pkg>yhatr</pkg>
<pkg>RSocrata</pkg>
<pkg>bigml</pkg>
<pkg>MTurkR</pkg>
<pkg>rgauges</pkg>
<pkg>RSiteCatalyst</pkg>
<pkg>RGoogleAnalytics</pkg>
<pkg>GuardianR</pkg>
<pkg>imguR</pkg>
<pkg>RLastFM</pkg>
<pkg>nhlscraper</pkg>
<pkg>osmar</pkg>
<pkg>ggmap</pkg>
<pkg>streamR</pkg>
<pkg>twitteR</pkg>
<pkg>wethepeople</pkg>
<pkg>govdat</pkg>
<pkg>dvn</pkg>
<pkg>datamart</pkg>
<pkg>rDrop</pkg>
<pkg>zendeskR</pkg>
</packagelist>
<links>
<h2>Introduction</h2>
<p>This Task View contains information about using R to obtain and parse data from the web.</p>
<p>The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web.</p>
<p>If you have any comments or suggestions for additions or improvements for this taskview, go to Github and <a href="https://github.com/ropensci/webservices/issues">submit an issue</a> or make some changes and <a href="https://github.com/ropensci/webservices/pulls">submit a pull request</a>. If you have an issue with one of the packages, please contact the maintainer of the package.</p>
<p>A list of available packages and functions is presented below, grouped by the type of activity.</p>
<h2>Tools for working with the web from R</h2>
<h3>curl/http/ftp</h3>
<ul>
<li><pkg>RCurl</pkg>: a low level curl wrapper for R. </li>
<li><pkg>httr</pkg>: a light wrapper around RCurl that makes many things easier, but still allows you to access the lower level functionality of RCurl. </li>
</ul>
<p>httr has convenient http verbs: <code>GET()</code>, <code>POST()</code>, <code>PUT()</code>, <code>DELETE()</code>, <code>PATCH()</code>, <code>HEAD()</code>, <code>BROWSE()</code>. These wrap functions in RCurl, making them more convenient to use, though less configurable than counterparts in RCurl. Though note that you can pass in additional Curl options to the <code>config</code> parameter in http calls. The equivalent of httr&#39;s <code>GET()</code> in RCurl is <code>getForm()</code>. Likewise, the equivalent of httr&#39;s <code>POST()</code> in RCurl is <code>postForm()</code>. </p>
<p><a href="http://en.wikipedia.org/wiki/Http_status_codes">http status codes</a> are helpful for debugging http calls. httr package makes this easier using, for example, <code>stop_for_status()</code> gets the http status code from a response object, and stops the function if the call was not successful. See also <code>warn_for_status()</code>.</p>
<h3>Authentication</h3>
<p>Using web resources can require authentication, either via API keys, OAuth, username:password combination, or via other means. Additionally, sometimes web resources that require authentication be in the header of an http call, which requires a little bit of extra work. API keys and username:password combos can be combined within a url for a call to a web resource (api key: <a href="http://api.foo.org/?key=yourkey">http://api.foo.org/?key=yourkey</a>; user/pass: http://username:<a href="mailto:password@api.foo.org">password@api.foo.org</a>), or can be specified via commands in RCurl or httr. OAuth is the most complicated authentication process, and can be most easily done using httr. See the 6 demos within httr, three for OAuth 1.0 (linkedin, twitter, vimeo) and three for OAuth 2.0 (facebook, github, google). <pkg>ROAuth</pkg> is a package that provides a separate R interface to OAuth. OAuth is easier to to do in httr, so start there. </p>
<h3>Web frameworks</h3>
<p>RStudio recently created <pkg>Shiny</pkg>, which combines R, html, css, and javascript to make web applications. Related tools are available, including [openCPU]<a href="%5Bon%20CRAN%5D%5Bopencpucran%5D">opencpu</a> and <pkg>Rook</pkg>. However, Shiny is the most promising of these.</p>
<h3>Parsing data from the web</h3>
<ul>
<li>txt, csv, etc.: you can use <code>read.csv()</code> after acquiring the csv file from the web via e.g., <code>getURL()</code> from RCurl. <code>read.csv()</code> works with http but not https, i.e.: read.csv(&ldquo;http://&hellip;&rdquo;), but not read.csv(&ldquo;https://&hellip;&rdquo;). The <pkg>repmis</pkg> package contains a <code>source_data()</code> command to simplify this process, while also assigning SHA-1 hashes to uniquely identify file versions.</li>
<li>xml/html: the package <pkg>XML</pkg> by Duncan Temple-Lang contains functions for parsing xml and html, and supports <pkg>xpath</pkg> for searching xml (think regex for strings). <pkg>scrapeR</pkg> provides additional tools for scraping data from html and xml documents.</li>
<li>json/json-ld: <pkg>RJSONIO</pkg> by Duncan Temple-Lang. Another package, <pkg>rjson</pkg>, does many of the same tasks which RJSONIO does.</li>
<li>custom formats: Some web APIs provide custom data formats (e.g., X), which are usually modified xml or json, and handled by XML and RJSONIO, respectively.</li>
<li>An alternative to the XML package is <pkg>selectr</pkg>, which parses CSS3 Selectors and translates them to XPath 1.0 expressions. XML package is often used for xml and html, but selectr translates CSS selectors to XPath, so can use the CSS selectors instead of XPath. The <a href="http://selectorgadget.com/">selectorgadget browser extension</a> can be used to identify page elements. </li>
</ul>
<h3>Javascript</h3>
<p>Javascript provides many libraries to make interactive visualizations for the browser, either locally or on the web. An increasing number of R packages are providing the ability to make visualizations using various javascript libraries. Some of them include:</p>
<ul>
<li><a href="https://github.com/rstudio/ggvis">ggvis</a> ggvis makes it easy to describe interactive web graphics in R. It fuses the ideas of ggplot2 and shiny, rendering graphics on the web with vega.</li>
<li><a href="https://github.com/ramnathv/rCharts">rCharts</a> Interactive javascript charts from R (not on CRAN)</li>
<li><a href="https://github.com/metagraf/rVega">rVega</a> An R wrapper for Vega (not on CRAN)</li>
<li><a href="https://github.com/nachocab/clickme">clickme</a> An R package to create interactive plots (not on CRAN)</li>
</ul>
<h2>Data sources on the web available via R</h2>
<h3>Ecological and evolutionary biology</h3>
<ul>
<li><pkg>rvertnet</pkg>: A wrapper to the VertNet collections database API.</li>
<li><pkg>rgbif</pkg>: Interface to the Global Biodiversity Information Facility API methods</li>
<li><pkg>rfishbase</pkg>: A programmatic interface to fishbase.org.</li>
<li><pkg>rtreebase</pkg>: An R package for discovery, access and manipulation of online phylogenies</li>
<li><pkg>taxize</pkg>: Taxonomic information from around the web</li>
<li><pkg>dismo</pkg>: Species distribution modeling, with wrappers to some APIs. <a href="http://cran.r-project.org/web/packages/dismo/vignettes/brt.pdf">vignette</a></li>
<li><pkg>rnbn</pkg>: Access to the UK National Biodiversity Network data (not on CRAN)</li>
<li><pkg>rWBclimate</pkg>: R interface for the World Bank climate data (not on CRAN)</li>
<li><pkg>rbison</pkg>: Wrapper to the USGS Bison API (not on CRAN)</li>
<li><pkg>neotoma</pkg>: Programmatic R interface to the Neotoma Paleoecological Database (not on CRAN)</li>
<li><pkg>rnoaa</pkg>: R interface to NOAA Climate data API (not on CRAN)</li>
<li><pkg>rnpn</pkg>: Wrapper to the National Phenology Network database API</li>
<li><pkg>rfisheries</pkg>: Package for interacting with fisheries databases at openfisheries.org <a href="http://openfisheries.org/">more</a></li>
<li><pkg>rebird</pkg>: A programmatic interface to the eBird database</li>
<li><pkg>flora</pkg>: Retrieve taxonomical information of botanical names from the Flora do Brasil website</li>
<li><pkg>Rcolombos</pkg>: This package provides programmatic access to Colombos, a web based interface for exploring and analyzing comprehensive organism-specific cross-platform expression compendia of bacterial organisms.</li>
<li><pkg>Reol</pkg>: An R interface to the EOL API. Includes functions for downloading and extracting information off the EOL pages.</li>
<li><pkg>rPlant</pkg>: rPlant is an R interface to the the many computational resources iPlant offers through their RESTful application programming interface. Currently, rPlant functions interact with the iPlant foundational API, the Taxonomic Name Resolution Service API, and the Phylotastic Taxosaurus API. Before using rPlant, users will have to register with the iPlant Collaborative. <a href="http://www.iplantcollaborative.org/discover/discovery-environment">http://www.iplantcollaborative.org/discover/discovery-environment</a></li>
</ul>
<h3>Genes/genomes</h3>
<ul>
<li><pkg>cgdsr</pkg>: R-Based API for accessing the MSKCC Cancer Genomics Data Server (CGDS). <a href="http://www.cbioportal.org/public-portal">more</a></li>
<li><pkg>rsnps</pkg>: Wrapper to the openSNP data API and the Broad Institute SNP Annotation and Proxy Search. </li>
<li><pkg>rentrez</pkg>: Talk with NCBI entrez using R</li>
</ul>
<h3>Earth Science</h3>
<ul>
<li><pkg>RNCEP</pkg>: Global weather and climate data at your fingertips. <a href="https://sites.google.com/site/michaelukemp/rncep">more</a></li>
<li><pkg>crn</pkg>: The crn package provides the core functions required to download and format data from the Climate Reference Network. Both daily and hourly data are downloaded from the ftp, a consolidated file of all stations is created, station metadata is extracted. In addition functions for selecting individual variables and creating R friendly datasets for them is provided. <a href="http://stevemosher.wordpress.com/">more</a></li>
<li><pkg>BerkeleyEarth</pkg>: Data Input for Berkeley Earth Surface Temperature. <a href="http://stevemosher.wordpress.com/">more</a></li>
<li><pkg>waterData</pkg>: An R Package for Retrieval, Analysis, and Anomaly Calculation of Daily Hydrologic Time Series Data. <a href="http://pubs.usgs.gov/of/2012/1168/">more</a>, <a href="http://cran.r-project.org/web/packages/waterData/vignettes/vignette.pdf">vignette</a></li>
<li><pkg>CHCN</pkg>: A compilation of historical through contemporary climate measurements scraped from the Environment Canada Website Including tools for scraping data, creating metadata and formating temperature files.</li>
<li><pkg>decctools</pkg>: decctools provides functions for retrieving energy statistics from the United Kingdom Department of Energy and Climate Change and related data sources. The current version focuses on total final energy consumption statistics at the local authority, MSOA, and LSOA geographies. Methods for calculating the generation mix of grid electricity and its associated carbon intensity are also provided.</li>
<li><pkg>Metadata</pkg>: Collates Metadata for Climate Surface Stations</li>
<li>[sos4R][sos4R]: sos4R is a client for Sensor Observation Services (SOS) as specified by the Open Geospatial Consortium (OGC). It allows users to retrieve metadata from SOS web services and to interactively create requests for near real-time observation data based on the available sensors, phenomena, observations et cetera using thematic, temporal and spatial filtering.</li>
</ul>
<h3>Economics</h3>
<ul>
<li><pkg>WDI</pkg>: Search, extract and format data from the World Bank&#39;s World Development Indicators. <a href="https://sites.google.com/site/michaelukemp/rncep">more</a></li>
<li><pkg>FAOSTAT</pkg>: The package hosts a list of functions to download, manipulate, construct and aggregate agricultural statistics provided by the FAOSTAT databasthe Food and Agricultural Organization of the United Nations <a href="http://cran.r-project.org/web/packages/FAOSTAT/index.html">more</a>, <a href="http://cran.r-project.org/web/packages/FAOSTAT/vignettes/FAOSTAT.pdf">vignette</a></li>
</ul>
<h3>Chemistry</h3>
<ul>
<li><pkg>rpubchem</pkg>: Interface to the PubChem Collection.</li>
</ul>
<h3>Agriculture</h3>
<ul>
<li><pkg>cimis</pkg>: R package for retrieving data from CIMIS, the California Irrigation Management Information System.</li>
</ul>
<h3>Literature, metadata, text, and altmetrics</h3>
<ul>
<li><pkg>rplos</pkg>: A programmatic interface to the Web Service methods provided by the Public Library of Science journals for search.</li>
<li><pkg>rbhl</pkg>: R interface to the Biodiversity Heritage Library (BHL) API (not on CRAN)</li>
<li><pkg>rmetadata</pkg>: Get scholarly metadata from around the web (not on CRAN)</li>
<li><pkg>RMendeley</pkg>: Implementation of the Mendeley API in R</li>
<li><pkg>rentrez</pkg>: Talk with NCBI entrez using R</li>
<li><pkg>rorcid</pkg>: A programmatic interface the Orcid.org API (not on CRAN)</li>
<li><pkg>rpubmed</pkg>: Tools for extracting and processing Pubmed and Pubmed Central records (not on CRAN)</li>
<li><pkg>rAltmetic</pkg>: Query and visualize metrics from Altmetric.com (not on CRAN)</li>
<li><pkg>rImpactStory</pkg>: Programmatic interface to the ImpactStory API</li>
<li><pkg>alm</pkg>: R wrapper to the almetrics API platform developed by PLoS (not on CRAN)</li>
<li><pkg>ngramr</pkg>: Retrieve and plot word frequencies through time from the Google Ngram Viewer (<a href="https://github.com/seancarmody/ngramr">development vesion</a>)</li>
</ul>
<h3>Marketing</h3>
<ul>
<li><pkg>anametrix</pkg>: Bidirectional connector to Anametrix API</li>
</ul>
<h3>Data depots</h3>
<ul>
<li><pkg>rfigshare</pkg>: Programmatic interface for Figshare <a href="http://figshare.com/">more</a></li>
<li><pkg>factualR</pkg>: Thin wrapper for the Factual.com server API. <a href="http://www.exmachinatech.net/01/factualr/">more</a></li>
<li><pkg>dataone</pkg>: A package that provides read/write access to data and metadata from the DataONE network of Member Node data repositories. <a href="http://releases.dataone.org/online/dataone_r/">more</a></li>
<li><pkg>yhatr</pkg>: yhatr lets you deploy, maintain, and invoke models via the Yhat REST API.</li>
<li><pkg>RSocrata</pkg>: Provided with a Socrata dataset resource URL, or a Socrata SoDA web API query, returns an R data frame. Converts dates to POSIX format. Supports CSV and JSON. Manages throttling by Socrata.</li>
</ul>
<h3>Machine learning as a service (MLaaS anyone?)</h3>
<ul>
<li><pkg>bigml</pkg>: BigML, a machine learning web service <a href="https://bigml.com/">more</a></li>
<li><pkg>MTurkR</pkg>: Access to Amazon Mechanical Turk Requester API via R. <a href="http://thomasleeper.com/MTurkR/index.html">more</a></li>
</ul>
<h3>Web Analytics</h3>
<ul>
<li><pkg>rgauges</pkg>: Interface to Gaug.es API <a href="https://secure.gaug.es">more</a> (not on CRAN)</li>
<li><pkg>RSiteCatalyst</pkg>: Functions for accessing the Adobe Analytics (Omniture SiteCatalyst) Reporting API</li>
<li><pkg>RGoogleAnalytics</pkg>: Provides access to Google Analytics. <a href="http://www.tatvic.com/blog/ga-data-extraction-in-r/">tutorial</a></li>
</ul>
<h3>News</h3>
<ul>
<li><pkg>GuardianR</pkg>: Provides an interface to the Open Platform&#39;s Content API of the Guardian Media Group. It retrieves content from news outlets The Observer, The Guardian, and guardian.co.uk from 1999 to current day</li>
</ul>
<h3>Images/videos/music</h3>
<ul>
<li><pkg>imguR</pkg>: A package to share plots using the image hosting service imgur.com</li>
<li><pkg>RLastFM</pkg>: A package to interface to the last.fm API.</li>
</ul>
<h3>Sports</h3>
<ul>
<li><pkg>nhlscraper</pkg>: Compiling the NHL Real Time Scoring System Database for easy use in R</li>
</ul>
<h3>Maps</h3>
<ul>
<li><pkg>osmar</pkg>: This package provides infrastructure to access OpenStreetMap data from different sources, to work with the data in common R manner, and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).</li>
<li><pkg>ggmap</pkg>: ggmap allows for the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2.</li>
</ul>
<h3>Social media</h3>
<ul>
<li><pkg>streamR</pkg>: This package provides a series of functions that allow R users to access Twitter&#39;s filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported.</li>
<li><pkg>twitteR</pkg>: Provides an interface to the Twitter web API</li>
</ul>
<h3>Government</h3>
<ul>
<li><pkg>wethepeople</pkg>: An R client for interacting with the White House&#39;s &ldquo;We The People&rdquo; petition API</li>
<li><pkg>govdat</pkg>: Interface to various APIs for government data, including New York Times congress API, and the Sunlight Foundation set of APIs.</li>
</ul>
<h3>Other</h3>
<ul>
<li><pkg>dvn</pkg>: Provides access to The Dataverse Network API. <a href="http://thedata.org/">more</a></li>
<li>[sos4R][sos4R]: R client for the OGC Sensor Observation Service. <a href="http://www.nordholmen.net/sos4r">more</a></li>
<li><pkg>datamart</pkg>: Unified access to various data sources.</li>
<li><pkg>rDrop</pkg>: Dropbox interface.</li>
<li><pkg>zendeskR</pkg>: This package provides an R wrapper for the Zendesk API</li>
</ul>
</links>
</CRANTaskView>
name maintainer email version
Working with data on the web
Scott Chamberlain, Karthik Ram, Christopher Gandrud
scott at ropensci.org
2013-09-17

Introduction

This Task View contains information about using R to obtain and parse data from the web.

The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web.

If you have any comments or suggestions for additions or improvements for this taskview, go to Github and submit an issue or make some changes and submit a pull request. If you have an issue with one of the packages, please contact the maintainer of the package.

A list of available packages and functions is presented below, grouped by the type of activity.

Tools for working with the web from R

curl/http/ftp

  • RCurl: a low level curl wrapper for R.
  • httr: a light wrapper around RCurl that makes many things easier, but still allows you to access the lower level functionality of RCurl.

httr has convenient http verbs: GET(), POST(), PUT(), DELETE(), PATCH(), HEAD(), BROWSE(). These wrap functions in RCurl, making them more convenient to use, though less configurable than counterparts in RCurl. Though note that you can pass in additional Curl options to the config parameter in http calls. The equivalent of httr's GET() in RCurl is getForm(). Likewise, the equivalent of httr's POST() in RCurl is postForm().

http status codes are helpful for debugging http calls. httr package makes this easier using, for example, stop_for_status() gets the http status code from a response object, and stops the function if the call was not successful. See also warn_for_status().

Authentication

Using web resources can require authentication, either via API keys, OAuth, username:password combination, or via other means. Additionally, sometimes web resources that require authentication be in the header of an http call, which requires a little bit of extra work. API keys and username:password combos can be combined within a url for a call to a web resource (api key: http://api.foo.org/?key=yourkey; user/pass: http://username:password@api.foo.org), or can be specified via commands in RCurl or httr. OAuth is the most complicated authentication process, and can be most easily done using httr. See the 6 demos within httr, three for OAuth 1.0 (linkedin, twitter, vimeo) and three for OAuth 2.0 (facebook, github, google). ROAuth is a package that provides a separate R interface to OAuth. OAuth is easier to to do in httr, so start there.

Web frameworks

RStudio recently created Shiny, which combines R, html, css, and javascript to make web applications. Related tools are available, including openCPU (on CRAN) and Rook. However, Shiny is the most promising of these.

Parsing data from the web

  • txt, csv, etc.: you can use read.csv() after acquiring the csv file from the web via e.g., getURL() from RCurl. read.csv() works with http but not https, i.e.: read.csv("http://..."), but not read.csv("https://..."). The repmis package contains a source_data() command to simplify this process, while also assigning SHA-1 hashes to uniquely identify file versions.
  • xml/html: the package XML by Duncan Temple-Lang contains functions for parsing xml and html, and supports xpath for searching xml (think regex for strings). scrapeR provides additional tools for scraping data from html and xml documents.
  • json/json-ld: RJSONIO by Duncan Temple-Lang. Another package, rjson, does many of the same tasks which RJSONIO does.
  • custom formats: Some web APIs provide custom data formats (e.g., X), which are usually modified xml or json, and handled by XML and RJSONIO, respectively.
  • An alternative to the XML package is selectr, which parses CSS3 Selectors and translates them to XPath 1.0 expressions. XML package is often used for xml and html, but selectr translates CSS selectors to XPath, so can use the CSS selectors instead of XPath. The selectorgadget browser extension can be used to identify page elements.

Javascript

Javascript provides many libraries to make interactive visualizations for the browser, either locally or on the web. An increasing number of R packages are providing the ability to make visualizations using various javascript libraries. Some of them include:

  • ggvis ggvis makes it easy to describe interactive web graphics in R. It fuses the ideas of ggplot2 and shiny, rendering graphics on the web with vega.
  • rCharts Interactive javascript charts from R (not on CRAN)
  • rVega An R wrapper for Vega (not on CRAN)
  • clickme An R package to create interactive plots (not on CRAN)

Data sources on the web available via R

Ecological and evolutionary biology

  • rvertnet: A wrapper to the VertNet collections database API.
  • rgbif: Interface to the Global Biodiversity Information Facility API methods
  • rfishbase: A programmatic interface to fishbase.org.
  • rtreebase: An R package for discovery, access and manipulation of online phylogenies
  • taxize: Taxonomic information from around the web
  • dismo: Species distribution modeling, with wrappers to some APIs. vignette
  • rnbn: Access to the UK National Biodiversity Network data (not on CRAN)
  • rWBclimate: R interface for the World Bank climate data (not on CRAN)
  • rbison: Wrapper to the USGS Bison API (not on CRAN)
  • neotoma: Programmatic R interface to the Neotoma Paleoecological Database (not on CRAN)
  • rnoaa: R interface to NOAA Climate data API (not on CRAN)
  • rnpn: Wrapper to the National Phenology Network database API
  • rfisheries: Package for interacting with fisheries databases at openfisheries.org more
  • rebird: A programmatic interface to the eBird database
  • flora: Retrieve taxonomical information of botanical names from the Flora do Brasil website
  • Rcolombos: This package provides programmatic access to Colombos, a web based interface for exploring and analyzing comprehensive organism-specific cross-platform expression compendia of bacterial organisms.
  • Reol: An R interface to the EOL API. Includes functions for downloading and extracting information off the EOL pages.
  • rPlant: rPlant is an R interface to the the many computational resources iPlant offers through their RESTful application programming interface. Currently, rPlant functions interact with the iPlant foundational API, the Taxonomic Name Resolution Service API, and the Phylotastic Taxosaurus API. Before using rPlant, users will have to register with the iPlant Collaborative. http://www.iplantcollaborative.org/discover/discovery-environment

Genes/genomes

  • cgdsr: R-Based API for accessing the MSKCC Cancer Genomics Data Server (CGDS). more
  • rsnps: Wrapper to the openSNP data API and the Broad Institute SNP Annotation and Proxy Search.
  • rentrez: Talk with NCBI entrez using R

Earth Science

  • RNCEP: Global weather and climate data at your fingertips. more
  • crn: The crn package provides the core functions required to download and format data from the Climate Reference Network. Both daily and hourly data are downloaded from the ftp, a consolidated file of all stations is created, station metadata is extracted. In addition functions for selecting individual variables and creating R friendly datasets for them is provided. more
  • BerkeleyEarth: Data Input for Berkeley Earth Surface Temperature. more
  • waterData: An R Package for Retrieval, Analysis, and Anomaly Calculation of Daily Hydrologic Time Series Data. more, vignette
  • CHCN: A compilation of historical through contemporary climate measurements scraped from the Environment Canada Website Including tools for scraping data, creating metadata and formating temperature files.
  • decctools: decctools provides functions for retrieving energy statistics from the United Kingdom Department of Energy and Climate Change and related data sources. The current version focuses on total final energy consumption statistics at the local authority, MSOA, and LSOA geographies. Methods for calculating the generation mix of grid electricity and its associated carbon intensity are also provided.
  • Metadata: Collates Metadata for Climate Surface Stations
  • sos4R: sos4R is a client for Sensor Observation Services (SOS) as specified by the Open Geospatial Consortium (OGC). It allows users to retrieve metadata from SOS web services and to interactively create requests for near real-time observation data based on the available sensors, phenomena, observations et cetera using thematic, temporal and spatial filtering.

Economics

  • WDI: Search, extract and format data from the World Bank's World Development Indicators. more
  • FAOSTAT: The package hosts a list of functions to download, manipulate, construct and aggregate agricultural statistics provided by the FAOSTAT databasthe Food and Agricultural Organization of the United Nations more, vignette

Chemistry

  • rpubchem: Interface to the PubChem Collection.

Agriculture

  • cimis: R package for retrieving data from CIMIS, the California Irrigation Management Information System.

Literature, metadata, text, and altmetrics

  • rplos: A programmatic interface to the Web Service methods provided by the Public Library of Science journals for search.
  • rbhl: R interface to the Biodiversity Heritage Library (BHL) API (not on CRAN)
  • rmetadata: Get scholarly metadata from around the web (not on CRAN)
  • RMendeley: Implementation of the Mendeley API in R
  • rentrez: Talk with NCBI entrez using R
  • rorcid: A programmatic interface the Orcid.org API (not on CRAN)
  • rpubmed: Tools for extracting and processing Pubmed and Pubmed Central records (not on CRAN)
  • rAltmetic: Query and visualize metrics from Altmetric.com (not on CRAN)
  • rImpactStory: Programmatic interface to the ImpactStory API
  • alm: R wrapper to the almetrics API platform developed by PLoS (not on CRAN)
  • ngramr: Retrieve and plot word frequencies through time from the Google Ngram Viewer (development vesion)

Marketing

  • anametrix: Bidirectional connector to Anametrix API

Data depots

  • rfigshare: Programmatic interface for Figshare more
  • factualR: Thin wrapper for the Factual.com server API. more
  • dataone: A package that provides read/write access to data and metadata from the DataONE network of Member Node data repositories. more
  • yhatr: yhatr lets you deploy, maintain, and invoke models via the Yhat REST API.
  • RSocrata: Provided with a Socrata dataset resource URL, or a Socrata SoDA web API query, returns an R data frame. Converts dates to POSIX format. Supports CSV and JSON. Manages throttling by Socrata.

Machine learning as a service (MLaaS anyone?)

  • bigml: BigML, a machine learning web service more
  • MTurkR: Access to Amazon Mechanical Turk Requester API via R. more

Web Analytics

News

  • GuardianR: Provides an interface to the Open Platform's Content API of the Guardian Media Group. It retrieves content from news outlets The Observer, The Guardian, and guardian.co.uk from 1999 to current day

Images/videos/music

  • imguR: A package to share plots using the image hosting service imgur.com
  • RLastFM: A package to interface to the last.fm API.

Sports

  • nhlscraper: Compiling the NHL Real Time Scoring System Database for easy use in R

Maps

  • osmar: This package provides infrastructure to access OpenStreetMap data from different sources, to work with the data in common R manner, and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects).
  • ggmap: ggmap allows for the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2.

Social media

  • streamR: This package provides a series of functions that allow R users to access Twitter's filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported.
  • twitteR: Provides an interface to the Twitter web API

Government

  • wethepeople: An R client for interacting with the White House's "We The People" petition API
  • govdat: Interface to various APIs for government data, including New York Times congress API, and the Sunlight Foundation set of APIs.

Other

  • dvn: Provides access to The Dataverse Network API. more
  • sos4R: R client for the OGC Sensor Observation Service. more
  • datamart: Unified access to various data sources.
  • rDrop: Dropbox interface.
  • zendeskR: This package provides an R wrapper for the Zendesk API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment