Skip to content

Instantly share code, notes, and snippets.

@pchaigno
Last active August 29, 2015 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pchaigno/ee39c02b09d51116ac85 to your computer and use it in GitHub Desktop.
Save pchaigno/ee39c02b09d51116ac85 to your computer and use it in GitHub Desktop.

enigma

opts_chunk$set(fig.width=8, fig.pos="h", fig.path="inst/assets/figure/")

Build Status An R client for Enigma.io Enigma holds government data and provides a really nice set of APIs for data, metadata, and stats on each of the datasets. That is, you can request a dataset itself, metadata on the dataset, and summary statistics on the columns of each dataset.

enigma info

LICENSE

MIT, see LICENSE file and MIT text

Quick start

Install

install.packages("devtools")
library("devtools")
install_github("ropengov/enigma")
library("enigma")

Get data

out <- enigma_data(dataset='us.gov.whitehouse.visitor-list', select=c('namelast','visitee_namelast','last_updatedby'))

Some metadata on the results

out$info

Look at the data, first 6 rows for readme brevity

head(out$result)

Statistics on dataset columns

out <- enigma_stats(dataset='us.gov.whitehouse.visitor-list', select='total_people')

Some summary stats

out$result[c('sum','avg','stddev','variance','min','max')]

Frequency details

head(out$result$frequency)

Metadata on datasets

out <- enigma_metadata(dataset='us.gov.whitehouse')

Paths

out$info$paths

Immediate nodes

out$info$immediate_nodes

Children tables

out$info$children_tables[[1]]

Use case: Plot frequency of flight distances

First, get columns for the air carrier dataset

dset <- 'us.gov.dot.rita.trans-stats.air-carrier-statistics.t100d-market-all-carrier'
head(enigma_metadata(dset)$columns$table[,c(1:4)])

Looks like there's a column called distance that we can search on. We by default for varchar type columns only frequency bake for the column.

out <- enigma_stats(dset, select='distance')
head(out$result$frequency)

Then we can do a bit of tidying and make a plot

library("ggplot2")
library("ggthemes")
df <- out$result$frequency
df <- data.frame(distance=as.numeric(df$distance), count=as.numeric(df$count))
ggplot(df, aes(distance, count)) +
geom_bar(stat="identity") +
geom_point() +
theme_grey(base_size = 18) +
labs(y="flights", x="distance (miles)")

Direct dataset download

Enigma provides an endpoint .../export/<datasetid> to download a zipped csv file of the entire dataset. enigma_fetch() gives you an easy way to download these to a specific place on your machine. And a message tells you that a file has been written to disk.

enigma_fetch(dataset='com.crunchbase.info.companies.acquisition')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment