Skip to content

Instantly share code, notes, and snippets.

@rasmi
Created November 6, 2018 19:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rasmi/e124b9ee980e3bda2a50b8950c646e93 to your computer and use it in GitHub Desktop.
Save rasmi/e124b9ee980e3bda2a50b8950c646e93 to your computer and use it in GitHub Desktop.
Upload Vera Incarceration Trends Dataset to BigQuery
#!/bin/sh
# Copyright 2018 Google LLC.
# SPDX-License-Identifier: Apache-2.0
DATA_FILENAME="incarceration_trends.csv"
DATA_URL="https://github.com/vera-institute/incarceration_trends/raw/master/incarceration_trends.csv"
BQ_DATASET_NAME="vera_data"
BQ_DATASET_DESCRIPTION="County-level jail data (1970-2015) and prison data (1983-2015)"
BQ_SCHEMA_FILENAME="schema.json"
BQ_SCHEMA_URL="https://gist.githubusercontent.com/rasmi/b46101a1d20d3da46bafbd77299126c9/raw/c7f12bbdff7adc61e763ff1be22bc21215dc3ff5/schema.json"
BQ_TABLE_NAME="incarceration_trends"
# Download data file locally.
wget $DATA_URL
# Download schema file locally.
wget $BQ_SCHEMA_URL
# Create a BQ dataset to store the table in if it does not exist.
bq --location="US" mk --dataset \
--description "$BQ_DATASET_DESCRIPTION" \
$GOOGLE_CLOUD_PROJECT:$BQ_DATASET_NAME
#################################################
# MAKE DATASET PUBLICLY AVAILABLE AS READ-ONLY. #
#################################################
# Download the permissions file to edit read permissions.
bq show --format=prettyjson $BQ_DATASET_NAME > permissions.json
# Patch the permissions file to allow allAuthenticatedUsers to read the dataset.
sed -i 's/projectReaders/allAuthenticatedUsers/' permissions.json
# Uploaded patched permissions file to make changes take effect.
bq update --source permissions.json $BQ_DATASET_NAME
# Clean up.
rm permissions.json
#################################################
# Load the local CSV data into the BQ table.
bq load --source_format=CSV --skip_leading_rows=1 $BQ_DATASET_NAME.$BQ_TABLE_NAME $DATA_FILENAME $BQ_SCHEMA_FILENAME
# Clean up.
rm $DATA_FILENAME $BQ_SCHEMA_FILENAME
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment