Skip to content

Instantly share code, notes, and snippets.

@pchaigno
Last active August 29, 2015 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pchaigno/8e930ec590b3e1e33b0f to your computer and use it in GitHub Desktop.
Save pchaigno/8e930ec590b3e1e33b0f to your computer and use it in GitHub Desktop.
enigma
=======
```{r, eval=TRUE, echo=FALSE}
opts_chunk$set(fig.width=8, fig.pos="h", fig.path="inst/assets/figure/")
```
[![Build Status](https://api.travis-ci.org/rOpenGov/enigma.png)](https://travis-ci.org/rOpenGov/enigma)
**An R client for [Enigma.io](https://app.enigma.io/)**
Enigma holds government data and provides a really nice set of APIs for data, metadata, and stats on each of the datasets. That is, you can request a dataset itself, metadata on the dataset, and summary statistics on the columns of each dataset.
## enigma info
+ [enigma home page](https://app.enigma.io/)
+ [API docs](https://app.enigma.io/api)
## LICENSE
MIT, see [LICENSE file](https://github.com/rOpenGov/enigma/blob/master/LICENSE) and [MIT text](http://opensource.org/licenses/MIT)
## Quick start
### Install
```{r eval=FALSE}
install.packages("devtools")
library("devtools")
install_github("ropengov/enigma")
```
```{r}
library("enigma")
```
### Get data
```{r}
out <- enigma_data(dataset='us.gov.whitehouse.visitor-list', select=c('namelast','visitee_namelast','last_updatedby'))
```
Some metadata on the results
```{r}
out$info
```
Look at the data, first 6 rows for readme brevity
```{r}
head(out$result)
```
### Statistics on dataset columns
```{r}
out <- enigma_stats(dataset='us.gov.whitehouse.visitor-list', select='total_people')
```
Some summary stats
```{r}
out$result[c('sum','avg','stddev','variance','min','max')]
```
Frequency details
```{r}
head(out$result$frequency)
```
### Metadata on datasets
```{r}
out <- enigma_metadata(dataset='us.gov.whitehouse')
```
Paths
```{r}
out$info$paths
```
Immediate nodes
```{r}
out$info$immediate_nodes
```
Children tables
```{r}
out$info$children_tables[[1]]
```
### Use case: Plot frequency of flight distances
First, get columns for the air carrier dataset
```{r}
dset <- 'us.gov.dot.rita.trans-stats.air-carrier-statistics.t100d-market-all-carrier'
head(enigma_metadata(dset)$columns$table[,c(1:4)])
```
Looks like there's a column called _distance_ that we can search on. We by default for `varchar` type columns only `frequency` bake for the column.
```{r}
out <- enigma_stats(dset, select='distance')
head(out$result$frequency)
```
Then we can do a bit of tidying and make a plot
```{r warning=FALSE, message=FALSE, tidy=FALSE}
library("ggplot2")
library("ggthemes")
df <- out$result$frequency
df <- data.frame(distance=as.numeric(df$distance), count=as.numeric(df$count))
ggplot(df, aes(distance, count)) +
geom_bar(stat="identity") +
geom_point() +
theme_grey(base_size = 18) +
labs(y="flights", x="distance (miles)")
```
### Direct dataset download
Enigma provides an endpoint `.../export/<datasetid>` to download a zipped csv file of the entire dataset.
`enigma_fetch()` gives you an easy way to download these to a specific place on your machine. And a message tells you that a file has been written to disk.
```r
enigma_fetch(dataset='com.crunchbase.info.companies.acquisition')
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment