Skip to content

Instantly share code, notes, and snippets.

@onyxfish
Created March 30, 2017 14:06
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save onyxfish/43b4455e905b1972155a2f9ee8f8438b to your computer and use it in GitHub Desktop.
Save onyxfish/43b4455e905b1972155a2f9ee8f8438b to your computer and use it in GitHub Desktop.
Import the entire Bureau of Labor Statistics (BLS) Quarterly Census of Wages (QCEW) dataset into a PostgreSQL database

QCEW Data Loader

These scripts import the entire Bureau of Labor Statistics Quarterly Census of Employement and Wages (from 1990 to latest) into one giant PostgreSQL database.

The database created by this process will use about 100GB of disk space. Make sure you have enough space available before you start!

Configuration

Database name, table name, and more can be configured via config.sh.

Usage

./download_data.sh
./create_db.sh
./import_data.sh

Lookups

If you want to map codes in the database to their string equivalents you'll need to download the lookup tables from the layout documentation, load them into tables, and join across.

#!/bin/bash
DATABASE="qcew"
TABLE="data"
PSQL="/usr/local/bin/psql"
PATH=$(pwd)
#!/bin/bash
source config.sh
CREATE_QUERY="
CREATE TABLE $TABLE (
area_fips varchar(5),
own_code varchar(1),
industry_code varchar(6),
agglvl_code varchar(2),
size_code varchar(1),
year varchar(4),
qtr varchar(1),
disclosure_code varchar(1),
qtrly_estabs integer,
month1_emplvl integer,
month2_emplvl integer,
month3_emplvl integer,
total_qtrly_wages bigint,
taxable_qtrly_wages bigint,
qtrly_contributions bigint,
avg_wkly_wage integer,
lq_disclosure_code varchar(1),
lq_qtrly_estabs real,
lq_month1_emplvl real,
lq_month2_emplvl real,
lq_month3_emplvl real,
lq_total_qtrly_wages real,
lq_taxable_qtrly_wages real,
lq_qtrly_contributions real,
lq_avg_wkly_wage real,
oty_disclosure_code varchar(1),
oty_qtrly_estabs_chg integer,
oty_qtrly_estabs_pct_chg real,
oty_month1_emplvl_chg integer,
oty_month1_emplvl_pct_chg real,
oty_month2_emplvl_chg integer,
oty_month2_emplvl_pct_chg real,
oty_month3_emplvl_chg integer,
oty_month3_emplvl_pct_chg real,
oty_total_qtrly_wages_chg bigint,
oty_total_qtrly_wages_pct_chg real,
oty_taxable_qtrly_wages_chg bigint,
oty_taxable_qtrly_wages_pct_chg real,
oty_qtrly_contributions_chg bigint,
oty_qtrly_contributions_pct_chg real,
oty_avg_wkly_wage_chg integer,
oty_avg_wkly_wage_pct_chg real
);
CREATE INDEX ${TABLE}_area_fips_index ON $TABLE (area_fips);
CREATE INDEX ${TABLE}_industry_code_index ON $TABLE (industry_code);
CREATE INDEX ${TABLE}_agglvl_code_index ON $TABLE (agglvl_code);
"
$PSQL -q "$DATABASE" -c "$CREATE_QUERY";
#!/bin/bash
# URLs for the QCEW single table zip files
urls=""
for year in `seq 1990 2016`; do
urls="${urls} http://www.bls.gov/cew/data/files/${year}/csv/${year}_qtrly_singlefile.zip"
done
# Download all of these zip files
curl -L --remote-name-all $urls
# Unzip all the zip files, each containing one csv
unzip "*_singlefile.zip"
mv *_singlefile.zip data
mv *_singlefile.csv data
#!/bin/bash
source config.sh
DATA_FILES=$PATH/data/*.singlefile.csv
for f in $DATA_FILES
do
/bin/date
echo $f
COPY_QUERY="
COPY $TABLE
FROM '$f' DELIMITER ',' CSV HEADER;
"
$PSQL -q "$DATABASE" -c "$COPY_QUERY";
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment