There are many ways to work with GIS data. I've used QGIS (or cartodb and mapbox online) when I need to explore shapefiles. When building a map or application built on a map, I like to use command line tools. The examples below use the GDAL command line tools as a lot of GIS libraries use the GDAL libraries under the hood. If you're somewhat familiar with SQL, the tools can be a quick way to explore the contents of shapefiles in a scriptable way.
For the examples below I used shapefiles that I downloaded directly from the US census: North Carolina Census Tracts
This shapefile contains polygon regions for all census tracts in NC. For the purposes of clustering we may only want the centroid of each tract. To create a CSV of all tracts in the shapefile, only a couple commands are required:
# compute the centroid, and save it in a new shapefile called to_points.shp
ogr2ogr -sql "select ST_Centroid(geometry), * from gz_2010_37_140_00_500k" -dialect sqlite to_points.shp gz_2010_37_140_00_500k.shp
# export the contents of to_points.shp as a CSV:
ogr2ogr -f CSV data.csv to_points.shp -lco GEOMETRY=AS_XY
head data.csv
X,Y,GEO_ID,STATE,COUNTY,TRACT,NAME,LSAD,CENSUSAREA
-78.885756996617,35.1343343410603,1400000US37051002503,37,051,002503,25.03,Tract,1.692000000000000
-78.8918525968448,35.1587000553965,1400000US37051002504,37,051,002504,25.04,Tract,6.152000000000000
-78.9857582165518,34.9619059011642,1400000US37051003103,37,051,003103,31.03,Tract,3.562000000000000
-78.9922662719313,35.049602207779,1400000US37051003302,37,051,003302,33.02,Tract,2.074000000000000
-79.0053315052058,35.0615306202348,1400000US37051003311,37,051,003311,33.11,Tract,0.489000000000000
-79.0175261536099,35.141023301382,1400000US37051003403,37,051,003403,34.03,Tract,1.138000000000000
-78.9917427969515,35.1480636025726,1400000US37051003406,37,051,003406,34.06,Tract,2.289000000000000
-75.9565520542183,36.521772565394,1400000US37053110102,37,053,110102,1101.02,Tract,17.227000000000000
-75.6812540143835,36.0336919499072,1400000US37055970200,37,055,970200,9702,Tract,2.777000000000000
After performing some classification of the CSV data you may then want to do the reverse operation: add the classification data back to the original shapefile.
# here I'm making an example new column, with junk in it:
sed -e 's/$/,NEWCOL/' data.csv > data_with_new_column.csv
head data_with_new_column.csv
X,Y,GEO_ID,STATE,COUNTY,TRACT,NAME,LSAD,CENSUSAREA,NEWCOL
-78.885756996617,35.1343343410603,1400000US37051002503,37,051,002503,25.03,Tract,1.692000000000000,NEWCOL
-78.8918525968448,35.1587000553965,1400000US37051002504,37,051,002504,25.04,Tract,6.152000000000000,NEWCOL
-78.9857582165518,34.9619059011642,1400000US37051003103,37,051,003103,31.03,Tract,3.562000000000000,NEWCOL
-78.9922662719313,35.049602207779,1400000US37051003302,37,051,003302,33.02,Tract,2.074000000000000,NEWCOL
-79.0053315052058,35.0615306202348,1400000US37051003311,37,051,003311,33.11,Tract,0.489000000000000,NEWCOL
-79.0175261536099,35.141023301382,1400000US37051003403,37,051,003403,34.03,Tract,1.138000000000000,NEWCOL
-78.9917427969515,35.1480636025726,1400000US37051003406,37,051,003406,34.06,Tract,2.289000000000000,NEWCOL
-75.9565520542183,36.521772565394,1400000US37053110102,37,053,110102,1101.02,Tract,17.227000000000000,NEWCOL
-75.6812540143835,36.0336919499072,1400000US37055970200,37,055,970200,9702,Tract,2.777000000000000,NEWCOL
# Join the csv file with the shapefile using some column that uniquely identifies each item in the dataset
# For this data, I joined on the tract number. Then export to a new shapefile, and include the new
# column in the csv file:
ogr2ogr -sql "select shp.*, csv.NEWCOL from gz_2010_37_140_00_500k as shp left join 'data_with_new_column.csv'.data_with_new_column as csv on shp.TRACT = csv.TRACT" joined.shp gz_2010_37_140_00_500k.shp
At this point you have a new shapefile joined.shp
that contains the original data, and your additional classification information.
If you're not comfortable using the command line, this can be achieved with other desktop tools like QGIS as well.