Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active August 29, 2015 14:17
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dannguyen/0d19d59d6e80cd069c5e to your computer and use it in GitHub Desktop.
Save dannguyen/0d19d59d6e80cd069c5e to your computer and use it in GitHub Desktop.
NYPD Stop and Frisk 2014 update: Now there's official CSVs!

The NYPD just released their 2014 Stop and Frisk data, and in CSV format too! Here's some Bash commands to get the data and scope it out (see the previous guide here on how to graph it with R)

Gettin the data

Using bash:

Download the files

baseurl="http://www.nyc.gov/html/nypd/downloads/zip/analysis_and_planning/YYYY_sqf_csv.zip"
for year in $(seq 2003 2014); do
  url=$(echo $baseurl | sed "s/YYYY/$year/")
  echo $url
  curl -sO $url
done

Unzip the files

unzip -o "*.zip"

Do a quick count by year

wc -l *.csv

2014 comes in at 45,787 reports, which is nearly 5 times fewer than in 2013 (and more than 10 times fewer than 2012):

  160852 2003.csv
  313524 2004.csv
  398192 2005.csv
  506492 2006.csv
  472097 2007.csv
  540303 2008.csv
  581169 2009.csv
  601286 2010.csv
  685725 2011.csv
  532912 2012.csv
  191852 2013.csv
  45788 2014.csv
 5030192 total

Inspect the headers

Let's see if they made the headers all uniform, which would make this an easy import into a database:

for year in $(seq 2003 2014); do 
  fname="$year.csv"
  head -n 1 $fname | tr ',' $'\n' | tr '[:upper:]' '[:lower:]'
done | sort | uniq -c | sort -rn

BLAAARGH!!!!

  12 zip
  12 year
  12 ycoord
  12 xcoord
  12 weight
  12 typeofid
  12 trhsloc
  12 timestop
  12 sumoffen
  12 sumissue
  12 state
  12 sex
  12 ser_num
  12 sector
  12 searched
  12 sb_outln
  12 sb_other
  12 sb_hdobj
  12 sb_admis
  12 riflshot
  12 rf_verbl
  12 rf_vcrim
  12 rf_vcact
  12 rf_rfcmp
  12 rf_othsw
  12 rf_knowl
  12 rf_furt
  12 rf_bulg
  12 rf_attir
  12 revcmd
  12 repcmd
  12 recstat
  12 radio
  12 race
  12 post
  12 pistol
  12 pf_wall
  12 pf_ptwep
  12 pf_pepsp
  12 pf_other
  12 pf_hcuff
  12 pf_hands
  12 pf_grnd
  12 pf_drwep
  12 pf_baton
  12 perstop
  12 perobs
  12 pct
  12 othrweap
  12 othpers
  12 othfeatr
  12 offverb
  12 offunif
  12 offshld
  12 officrid
  12 machgun
  12 knifcuti
  12 inout
  12 ht_inch
  12 ht_feet
  12 haircolr
  12 frisked
  12 eyecolor
  12 explnstp
  12 dob
  12 datestop
  12 cs_vcrim
  12 cs_other
  12 cs_objcs
  12 cs_lkout
  12 cs_furtv
  12 cs_drgtr
  12 cs_descr
  12 cs_cloth
  12 cs_casng
  12 cs_bulge
  12 crimsusp
  12 contrabn
  12 compyear
  12 comppct
  12 city
  12 build
  12 beat
  12 asltweap
  12 arstoffn
  12 arstmade
  12 aptnum
  12 age
  12 adtlrept
  12 addrtyp
  12 ac_time
  12 ac_stsnd
  12 ac_rept
  12 ac_proxm
  12 ac_other
  12 ac_inves
  12 ac_incid
  12 ac_evasv
  12 ac_cgdir
  12 ac_assoc
  11 stname
  11 stinter
  11 rescode
  11 premtype
  11 premname
  11 linecm
  11 dettypcm
  11 detailcm
  11 crossst
  11 addrpct
  11 addrnum
   4 forceuse
   1 wepfound
   1 strname
   1 strintr
   1 rescod
   1 prenam
   1 premtyp
   1 dettyp_c
   1 details_
   1 detail1_
   1 crossst
   1 adrpct
   1 adrnum

I'll deal with this tomorrow. It'll probably use the same transformations as the ones I did on the previous files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment