Skip to content

Instantly share code, notes, and snippets.

@dsummersl
Last active March 9, 2021 20:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dsummersl/051bf0e36d1a0521675c32a371926a45 to your computer and use it in GitHub Desktop.
Save dsummersl/051bf0e36d1a0521675c32a371926a45 to your computer and use it in GitHub Desktop.
Example shapefile to csv, and csv into shapefile.

Shapefile to CSV

There are many ways to work with GIS data. I've used QGIS (or cartodb and mapbox online) when I need to explore shapefiles. When building a map or application built on a map, I like to use command line tools. The examples below use the GDAL command line tools as a lot of GIS libraries use the GDAL libraries under the hood. If you're somewhat familiar with SQL, the tools can be a quick way to explore the contents of shapefiles in a scriptable way.

For the examples below I used shapefiles that I downloaded directly from the US census: North Carolina Census Tracts

This shapefile contains polygon regions for all census tracts in NC. For the purposes of clustering we may only want the centroid of each tract. To create a CSV of all tracts in the shapefile, only a couple commands are required:

# compute the centroid, and save it in a new shapefile called to_points.shp
ogr2ogr -sql "select ST_Centroid(geometry), * from gz_2010_37_140_00_500k" -dialect sqlite to_points.shp gz_2010_37_140_00_500k.shp

# export the contents of to_points.shp as a CSV:
ogr2ogr -f CSV data.csv to_points.shp -lco GEOMETRY=AS_XY

head data.csv
X,Y,GEO_ID,STATE,COUNTY,TRACT,NAME,LSAD,CENSUSAREA
-78.885756996617,35.1343343410603,1400000US37051002503,37,051,002503,25.03,Tract,1.692000000000000
-78.8918525968448,35.1587000553965,1400000US37051002504,37,051,002504,25.04,Tract,6.152000000000000
-78.9857582165518,34.9619059011642,1400000US37051003103,37,051,003103,31.03,Tract,3.562000000000000
-78.9922662719313,35.049602207779,1400000US37051003302,37,051,003302,33.02,Tract,2.074000000000000
-79.0053315052058,35.0615306202348,1400000US37051003311,37,051,003311,33.11,Tract,0.489000000000000
-79.0175261536099,35.141023301382,1400000US37051003403,37,051,003403,34.03,Tract,1.138000000000000
-78.9917427969515,35.1480636025726,1400000US37051003406,37,051,003406,34.06,Tract,2.289000000000000
-75.9565520542183,36.521772565394,1400000US37053110102,37,053,110102,1101.02,Tract,17.227000000000000
-75.6812540143835,36.0336919499072,1400000US37055970200,37,055,970200,9702,Tract,2.777000000000000

CSV into Shapefile

After performing some classification of the CSV data you may then want to do the reverse operation: add the classification data back to the original shapefile.

# here I'm making an example new column, with junk in it:
sed -e 's/$/,NEWCOL/' data.csv > data_with_new_column.csv

head data_with_new_column.csv
X,Y,GEO_ID,STATE,COUNTY,TRACT,NAME,LSAD,CENSUSAREA,NEWCOL
-78.885756996617,35.1343343410603,1400000US37051002503,37,051,002503,25.03,Tract,1.692000000000000,NEWCOL
-78.8918525968448,35.1587000553965,1400000US37051002504,37,051,002504,25.04,Tract,6.152000000000000,NEWCOL
-78.9857582165518,34.9619059011642,1400000US37051003103,37,051,003103,31.03,Tract,3.562000000000000,NEWCOL
-78.9922662719313,35.049602207779,1400000US37051003302,37,051,003302,33.02,Tract,2.074000000000000,NEWCOL
-79.0053315052058,35.0615306202348,1400000US37051003311,37,051,003311,33.11,Tract,0.489000000000000,NEWCOL
-79.0175261536099,35.141023301382,1400000US37051003403,37,051,003403,34.03,Tract,1.138000000000000,NEWCOL
-78.9917427969515,35.1480636025726,1400000US37051003406,37,051,003406,34.06,Tract,2.289000000000000,NEWCOL
-75.9565520542183,36.521772565394,1400000US37053110102,37,053,110102,1101.02,Tract,17.227000000000000,NEWCOL
-75.6812540143835,36.0336919499072,1400000US37055970200,37,055,970200,9702,Tract,2.777000000000000,NEWCOL

# Join the csv file with the shapefile using some column that uniquely identifies each item in the dataset
# For this data, I joined on the tract number. Then export to a new shapefile, and include the new
# column in the csv file:
ogr2ogr -sql "select shp.*, csv.NEWCOL from gz_2010_37_140_00_500k as shp left join 'data_with_new_column.csv'.data_with_new_column as csv on shp.TRACT = csv.TRACT" joined.shp gz_2010_37_140_00_500k.shp

At this point you have a new shapefile joined.shp that contains the original data, and your additional classification information.

If you're not comfortable using the command line, this can be achieved with other desktop tools like QGIS as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment