The FBI's Uniform Crime Report (UCR) MASTER
data is stored in a fixed-width format which is not a standard format. This parser allows for conversion to CSV (tabular) data to allow for easier analysis.
- Python3 (tested with 3.6-3.8) with Pandas in your environment
After creating BOTH a header.csv
and record.csv
to identify the field type, the parser can run.
The following example is for the Arrest
(ASR) MASTER
data-set with the sample headers and records CSVs in this GIST:
python parser.py --headercsv sample_header_columns.csv --detailcsv sample_detail_columns.csv --input 2018_ASR1MON_NATIONAL_MASTER_FILE.txt
Each UCR data set contains a .doc
file or .pdf
file that lists the structure of the included fixed-width txt
source file. This parser expects you to create two CSVs that have UNIQUE names and the start and stop column location of the specific record being converted. Since 0-based indexes are being used, be sure to subtract 1 from the start location. The resulting 2 CSVs should be separated based on the record type, which is indicated in the documentation by 'HEADER' and 'DETAIL'. Review the attached PDF for an example and compare the PDF 'position' and 'description' with the values in the UCR zip.