Skip to content

Instantly share code, notes, and snippets.

@critmcdonald
Last active March 19, 2019 17:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save critmcdonald/151e1fa44c82ea5ca337b204cfc095d6 to your computer and use it in GitHub Desktop.
Save critmcdonald/151e1fa44c82ea5ca337b204cfc095d6 to your computer and use it in GitHub Desktop.
RNotebook code to download Mixed Beverage data
---
title: "Mixed beverage download"
output: html_notebook
---
## Setup
You might need to install the [RSocrata](https://cran.r-project.org/web/packages/RSocrata/index.html) package (run `install.packages("RSocrata")` in your R Console). [Documentation is here](https://github.com/Chicago/RSocrata).
```{r setup}
library(tidyverse)
library(RSocrata)
library(janitor)
```
## Understanding the data
Let's take a quick tour of the [Mixed Beverage Gross Receipts](https://data.texas.gov/Government-and-Taxes/Mixed-Beverage-Gross-Receipts/naix-2893) data on the state Socrata site.
## Download Version 1: Pull records with RSocrata
This version pulls directly from Socrata through their API. It creates a configurable URL to pull a date range of data based on the `obligation_end_date_yymmdd` field. Set the start and end date as the last day of the months you want to study. There is a "limit" for testing that is commented out.
```{r download}
start_date = '2016-01-31'
end_date = '2018-12-31'
county = '227'
mixbev_base_url = 'https://data.texas.gov/resource/fp9t-htqh.json?'
download_url = paste(
mixbev_base_url,
# "$limit=1000&",
"$where=obligation_end_date_yyyymmdd%20between%20",
"'", start_date, "'",
" and ",
"'", end_date, "'",
"&location_county=",
"'", county, "'",
sep = ""
)
download_url
receipts_api <- read.socrata(download_url)
```
Some notes about what is happening above:
- The first several lines are creating variables for dates and the base URL of the data. If we wanted data from a different date range or county, we could change those. Note that `227` is Travis County.
- The `download_url` variable pieces together the parts of the url that we need to download the data. This url is an "endpoint" for the Socrata API for this dataset. It uses [SoQL](https://dev.socrata.com/docs/queries/) queries to select only the data we want. The `paste()` function is just putting together the pieces of the URL endpoint based on our variables. I print it out at the end so you can see what the finished URL looks like. You could copy/paste that endpoint (between the "") into a browser to see what the data looks like.
- The last line uses the `read.socrata()` function from our RSocrata package to download the data and put it inside a dataframe called "receipts_api".
The example above could easily be repurposed to pull data from a different dataset on the Socrata platform.
### Fix columns than are not numbers
The API delivers the monetary amounts as `chr`, which means we can't do math on them. We have to change them to numbers using a `mutate()` function.
```{r fix_numbers}
receipts_api <- receipts_api %>%
mutate(
wine_receipts = as.integer(wine_receipts),
beer_receipts = as.integer(beer_receipts),
cover_charge_receipts = as.integer(cover_charge_receipts),
liquor_receipts = as.integer(liquor_receipts),
total_receipts = as.integer(total_receipts)
)
```
## Rearrange the columns
The order of the columns from the API kinda are in alphabetical order. Let's reorder them by using `select()` to put them in an order that makes more sense to us.
### Get the names first
```{r check_names}
names(receipts_api)
```
Now select them in a good order and reassign them.
```{r reorder_columns}
receipts_api <- receipts_api %>%
select(location_name, location_address, location_city, location_state, location_zip, location_county, location_number, total_receipts, beer_receipts, liquor_receipts, wine_receipts, cover_charge_receipts, inside_outside_city_limits_code_y_n, tabc_permit_number, taxpayer_address, taxpayer_name, taxpayer_city, taxpayer_state, taxpayer_zip, taxpayer_county, taxpayer_number, obligation_end_date_yyyymmdd, responsibility_begin_date_yyyymmdd, responsibility_end_date_yyyymmdd)
```
### Write to disc
```{r}
receipts_api %>% write_rds("data-raw/receipts_api.rds")
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment