Last active
August 29, 2015 14:06
-
-
Save pchaigno/8e930ec590b3e1e33b0f to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
enigma | |
======= | |
```{r, eval=TRUE, echo=FALSE} | |
opts_chunk$set(fig.width=8, fig.pos="h", fig.path="inst/assets/figure/") | |
``` | |
[![Build Status](https://api.travis-ci.org/rOpenGov/enigma.png)](https://travis-ci.org/rOpenGov/enigma) | |
**An R client for [Enigma.io](https://app.enigma.io/)** | |
Enigma holds government data and provides a really nice set of APIs for data, metadata, and stats on each of the datasets. That is, you can request a dataset itself, metadata on the dataset, and summary statistics on the columns of each dataset. | |
## enigma info | |
+ [enigma home page](https://app.enigma.io/) | |
+ [API docs](https://app.enigma.io/api) | |
## LICENSE | |
MIT, see [LICENSE file](https://github.com/rOpenGov/enigma/blob/master/LICENSE) and [MIT text](http://opensource.org/licenses/MIT) | |
## Quick start | |
### Install | |
```{r eval=FALSE} | |
install.packages("devtools") | |
library("devtools") | |
install_github("ropengov/enigma") | |
``` | |
```{r} | |
library("enigma") | |
``` | |
### Get data | |
```{r} | |
out <- enigma_data(dataset='us.gov.whitehouse.visitor-list', select=c('namelast','visitee_namelast','last_updatedby')) | |
``` | |
Some metadata on the results | |
```{r} | |
out$info | |
``` | |
Look at the data, first 6 rows for readme brevity | |
```{r} | |
head(out$result) | |
``` | |
### Statistics on dataset columns | |
```{r} | |
out <- enigma_stats(dataset='us.gov.whitehouse.visitor-list', select='total_people') | |
``` | |
Some summary stats | |
```{r} | |
out$result[c('sum','avg','stddev','variance','min','max')] | |
``` | |
Frequency details | |
```{r} | |
head(out$result$frequency) | |
``` | |
### Metadata on datasets | |
```{r} | |
out <- enigma_metadata(dataset='us.gov.whitehouse') | |
``` | |
Paths | |
```{r} | |
out$info$paths | |
``` | |
Immediate nodes | |
```{r} | |
out$info$immediate_nodes | |
``` | |
Children tables | |
```{r} | |
out$info$children_tables[[1]] | |
``` | |
### Use case: Plot frequency of flight distances | |
First, get columns for the air carrier dataset | |
```{r} | |
dset <- 'us.gov.dot.rita.trans-stats.air-carrier-statistics.t100d-market-all-carrier' | |
head(enigma_metadata(dset)$columns$table[,c(1:4)]) | |
``` | |
Looks like there's a column called _distance_ that we can search on. We by default for `varchar` type columns only `frequency` bake for the column. | |
```{r} | |
out <- enigma_stats(dset, select='distance') | |
head(out$result$frequency) | |
``` | |
Then we can do a bit of tidying and make a plot | |
```{r warning=FALSE, message=FALSE, tidy=FALSE} | |
library("ggplot2") | |
library("ggthemes") | |
df <- out$result$frequency | |
df <- data.frame(distance=as.numeric(df$distance), count=as.numeric(df$count)) | |
ggplot(df, aes(distance, count)) + | |
geom_bar(stat="identity") + | |
geom_point() + | |
theme_grey(base_size = 18) + | |
labs(y="flights", x="distance (miles)") | |
``` | |
### Direct dataset download | |
Enigma provides an endpoint `.../export/<datasetid>` to download a zipped csv file of the entire dataset. | |
`enigma_fetch()` gives you an easy way to download these to a specific place on your machine. And a message tells you that a file has been written to disk. | |
```r | |
enigma_fetch(dataset='com.crunchbase.info.companies.acquisition') | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment