Last active
March 19, 2019 17:03
-
-
Save critmcdonald/151e1fa44c82ea5ca337b204cfc095d6 to your computer and use it in GitHub Desktop.
RNotebook code to download Mixed Beverage data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Mixed beverage download" | |
output: html_notebook | |
--- | |
## Setup | |
You might need to install the [RSocrata](https://cran.r-project.org/web/packages/RSocrata/index.html) package (run `install.packages("RSocrata")` in your R Console). [Documentation is here](https://github.com/Chicago/RSocrata). | |
```{r setup} | |
library(tidyverse) | |
library(RSocrata) | |
library(janitor) | |
``` | |
## Understanding the data | |
Let's take a quick tour of the [Mixed Beverage Gross Receipts](https://data.texas.gov/Government-and-Taxes/Mixed-Beverage-Gross-Receipts/naix-2893) data on the state Socrata site. | |
## Download Version 1: Pull records with RSocrata | |
This version pulls directly from Socrata through their API. It creates a configurable URL to pull a date range of data based on the `obligation_end_date_yymmdd` field. Set the start and end date as the last day of the months you want to study. There is a "limit" for testing that is commented out. | |
```{r download} | |
start_date = '2016-01-31' | |
end_date = '2018-12-31' | |
county = '227' | |
mixbev_base_url = 'https://data.texas.gov/resource/fp9t-htqh.json?' | |
download_url = paste( | |
mixbev_base_url, | |
# "$limit=1000&", | |
"$where=obligation_end_date_yyyymmdd%20between%20", | |
"'", start_date, "'", | |
" and ", | |
"'", end_date, "'", | |
"&location_county=", | |
"'", county, "'", | |
sep = "" | |
) | |
download_url | |
receipts_api <- read.socrata(download_url) | |
``` | |
Some notes about what is happening above: | |
- The first several lines are creating variables for dates and the base URL of the data. If we wanted data from a different date range or county, we could change those. Note that `227` is Travis County. | |
- The `download_url` variable pieces together the parts of the url that we need to download the data. This url is an "endpoint" for the Socrata API for this dataset. It uses [SoQL](https://dev.socrata.com/docs/queries/) queries to select only the data we want. The `paste()` function is just putting together the pieces of the URL endpoint based on our variables. I print it out at the end so you can see what the finished URL looks like. You could copy/paste that endpoint (between the "") into a browser to see what the data looks like. | |
- The last line uses the `read.socrata()` function from our RSocrata package to download the data and put it inside a dataframe called "receipts_api". | |
The example above could easily be repurposed to pull data from a different dataset on the Socrata platform. | |
### Fix columns than are not numbers | |
The API delivers the monetary amounts as `chr`, which means we can't do math on them. We have to change them to numbers using a `mutate()` function. | |
```{r fix_numbers} | |
receipts_api <- receipts_api %>% | |
mutate( | |
wine_receipts = as.integer(wine_receipts), | |
beer_receipts = as.integer(beer_receipts), | |
cover_charge_receipts = as.integer(cover_charge_receipts), | |
liquor_receipts = as.integer(liquor_receipts), | |
total_receipts = as.integer(total_receipts) | |
) | |
``` | |
## Rearrange the columns | |
The order of the columns from the API kinda are in alphabetical order. Let's reorder them by using `select()` to put them in an order that makes more sense to us. | |
### Get the names first | |
```{r check_names} | |
names(receipts_api) | |
``` | |
Now select them in a good order and reassign them. | |
```{r reorder_columns} | |
receipts_api <- receipts_api %>% | |
select(location_name, location_address, location_city, location_state, location_zip, location_county, location_number, total_receipts, beer_receipts, liquor_receipts, wine_receipts, cover_charge_receipts, inside_outside_city_limits_code_y_n, tabc_permit_number, taxpayer_address, taxpayer_name, taxpayer_city, taxpayer_state, taxpayer_zip, taxpayer_county, taxpayer_number, obligation_end_date_yyyymmdd, responsibility_begin_date_yyyymmdd, responsibility_end_date_yyyymmdd) | |
``` | |
### Write to disc | |
```{r} | |
receipts_api %>% write_rds("data-raw/receipts_api.rds") | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment