Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active December 14, 2021 16:33
Show Gist options
  • Save dannguyen/c6bcc9884c25cf68f3550560ccae5ca8 to your computer and use it in GitHub Desktop.
Save dannguyen/c6bcc9884c25cf68f3550560ccae5ca8 to your computer and use it in GitHub Desktop.
converting Florida prison Access database into separate CSV and SQLite files

This example shows basic usage of mdbtools to extract CSV/SQL from Microsoft databases, but also involves a fairly convoluted Bash loop. For a simpler example, go here: Using mdbtools to extract CSV data from the FAA Wildlife Strike Database

Working with Access files from the command line

Use mdbtools: https://github.com/brianb/mdbtools

If you're on a Mac, you can install mdbtools via homebrew:

  brew install mdbtools

Florida prison data

The landing page for the data file is here -- you'll be directed to a Google Drive link: http://www.dc.state.fl.us/pub/obis_request.html

Or you can download a copy of the data that I've stashed here:

http://stash.compjour.org/samples/prisons/florida-inmates--2015-10-26.mdb.zip

The zipped file is 164MB and uncompresses to 1.25GB

Convert to MDB

I stumbled upon this old but still useful blogpost about using mdbtools to convert to csv/mysql:

http://nialldonegan.me/2007/03/10/converting-microsoft-access-mdb-into-csv-or-mysql-in-linux/

All together

Here's the shell script I used to batch convert the Florida prison data, including downloading and unzipping the file via curl. Note that I took some time to make this a fire-and-forget script, hence the ugly use of pcregrep to extract tablenames from the mdb-schema command and use that to batch create and fill the data. You probably don't need to do this if you're trying to convert a MDB database that is less complicated than Florida's prison database.

MDBNAME=florida-inmates--2015-10-26.mdb

curl -O http://stash.compjour.org/samples/prisons/$MDBNAME.zip
unzip $MDBNAME.zip

while read table_name; do
  schemaname="$table_name.sqlite.schema.sql"
  echo "Creating $schemaname"
  mdb-schema -T "$table_name" $MDBNAME sqlite  2> /dev/null |
    pcregrep -oM 'CREATE TABLE(?:.|\n)+?\)\;\n' > "$schemaname" 
  
 # sqlite data
  sqlname="$table_name.sqlite.data.sql"
  echo "Creating $sqlname"
  mdb-export -I sqlite $MDBNAME  "$table_name" > "$sqlname"
  # CSV data
  csvname="$table_name.csv"
  echo "Creating $csvname"
  mdb-export $MDBNAME "$table_name" > "$csvname"
done < <(mdb-tables -1 $MDBNAME)  

For each Access table, three files are generated:

  • A schema file in SQLite format
  • A data file in SQLite format
  • The table as plaintext CSV

Here are the final contents of the folder, when all is said and done:

├── CONTENTS.csv
├── CONTENTS.sqlite.data.sql
├── CONTENTS.sqlite.schema.sql
├── INMATE_ACTIVE_ALIASES.csv
├── INMATE_ACTIVE_ALIASES.sqlite.data.sql
├── INMATE_ACTIVE_ALIASES.sqlite.schema.sql
├── INMATE_ACTIVE_DETAINERS.csv
├── INMATE_ACTIVE_DETAINERS.sqlite.data.sql
├── INMATE_ACTIVE_DETAINERS.sqlite.schema.sql
├── INMATE_ACTIVE_INCARHIST-cleaned.csv
├── INMATE_ACTIVE_INCARHIST.csv
├── INMATE_ACTIVE_INCARHIST.sqlite.data.sql
├── INMATE_ACTIVE_INCARHIST.sqlite.schema.sql
├── INMATE_ACTIVE_OFFENSES_CPS-cleaned.csv
├── INMATE_ACTIVE_OFFENSES_CPS.csv
├── INMATE_ACTIVE_OFFENSES_CPS.sqlite.data.sql
├── INMATE_ACTIVE_OFFENSES_CPS.sqlite.schema.sql
├── INMATE_ACTIVE_OFFENSES_prpr-cleaned.csv
├── INMATE_ACTIVE_OFFENSES_prpr.csv
├── INMATE_ACTIVE_OFFENSES_prpr.sqlite.data.sql
├── INMATE_ACTIVE_OFFENSES_prpr.sqlite.schema.sql
├── INMATE_ACTIVE_ROOT-cleaned.csv
├── INMATE_ACTIVE_ROOT.csv
├── INMATE_ACTIVE_ROOT.sqlite.data.sql
├── INMATE_ACTIVE_ROOT.sqlite.schema.sql
├── INMATE_ACTIVE_SCARSMARKS.csv
├── INMATE_ACTIVE_SCARSMARKS.sqlite.data.sql
├── INMATE_ACTIVE_SCARSMARKS.sqlite.schema.sql
├── INMATE_RELEASE_ALIASES.csv
├── INMATE_RELEASE_ALIASES.sqlite.data.sql
├── INMATE_RELEASE_ALIASES.sqlite.schema.sql
├── INMATE_RELEASE_DETAINERS.csv
├── INMATE_RELEASE_DETAINERS.sqlite.data.sql
├── INMATE_RELEASE_DETAINERS.sqlite.schema.sql
├── INMATE_RELEASE_INCARHIST.csv
├── INMATE_RELEASE_INCARHIST.sqlite.data.sql
├── INMATE_RELEASE_INCARHIST.sqlite.schema.sql
├── INMATE_RELEASE_OFFENSES_CPS.csv
├── INMATE_RELEASE_OFFENSES_CPS.sqlite.data.sql
├── INMATE_RELEASE_OFFENSES_CPS.sqlite.schema.sql
├── INMATE_RELEASE_OFFENSES_prpr.csv
├── INMATE_RELEASE_OFFENSES_prpr.sqlite.data.sql
├── INMATE_RELEASE_OFFENSES_prpr.sqlite.schema.sql
├── INMATE_RELEASE_RESIDENCE.csv
├── INMATE_RELEASE_RESIDENCE.sqlite.data.sql
├── INMATE_RELEASE_RESIDENCE.sqlite.schema.sql
├── INMATE_RELEASE_ROOT.csv
├── INMATE_RELEASE_ROOT.sqlite.data.sql
├── INMATE_RELEASE_ROOT.sqlite.schema.sql
├── INMATE_RELEASE_SCARSMARKS.csv
├── INMATE_RELEASE_SCARSMARKS.sqlite.data.sql
├── INMATE_RELEASE_SCARSMARKS.sqlite.schema.sql
├── OFFENDER_ALIASES.csv
├── OFFENDER_ALIASES.sqlite.data.sql
├── OFFENDER_ALIASES.sqlite.schema.sql
├── OFFENDER_OFFENSES_CCS.csv
├── OFFENDER_OFFENSES_CCS.sqlite.data.sql
├── OFFENDER_OFFENSES_CCS.sqlite.schema.sql
├── OFFENDER_RESIDENCE.csv
├── OFFENDER_RESIDENCE.sqlite.data.sql
├── OFFENDER_RESIDENCE.sqlite.schema.sql
├── OFFENDER_ROOT.csv
├── OFFENDER_ROOT.sqlite.data.sql
├── OFFENDER_ROOT.sqlite.schema.sql
├── florida-inmates--2015-10-26.mdb
└── florida-inmates--2015-10-26.mdb.zip
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment