Skip to content

Instantly share code, notes, and snippets.

@swayson
Last active February 8, 2024 20:06
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save swayson/6aa54c9d7f01190292c0 to your computer and use it in GitHub Desktop.
Save swayson/6aa54c9d7f01190292c0 to your computer and use it in GitHub Desktop.
CSVKit Examples

1. Ditch Excel (for real)

    in2csv file1.xls > file1.csv

2. Conquer fixed-width formats

    in2csv -f fixed -s schema.csv data.fixed > data.csv

3. Find cells matching a regular expression

    csvgrep -c phone_number -r "\d{3}-123-\d{4}" data.csv > matching.csv

4. Turn your data into a JSON lookup table

    csvjson --key slug --indent 4 geo.csv

5. Turn a CSV with latitude and longitude columns into GeoJSON

    csvjson --lat latitude --lon longitude --key slug --crs EPSG:4269 --indent 4 geo.csv > geo.json

6. Generate summary statistics for any CSV file

    csvstat data.csv

7. Execute a SQL query directly on a CSV file

    csvsql --query "select name from data where age > 30" data.csv > old.csv
    
    csvsql --query "select m.usda_id, avg(i.sepal_length) as mean_sepal_length from iris as i join irismeta as m on (i.species = m.species) group by m.species" examples/iris.csv examples/irismeta.csv

8. Automatically create a SQL table and import a CSV into a database

    csvsql --db postgresql:///demographics --insert data.csv
    psql -q demographics -c "\d data"

9. Extract a table from a SQL database into a CSV

    sql2csv --db postgresql:///demographics --query "select * from data" > extract.csv

10. Turn your Github issues into a CSV

    curl https://api.github.com/repos/onyxfish/csvkit/issues?state=open | in2csv -f json > issues.csv

Handy commandline for examining a dataset.

number of lines

wc -l

See columns names

csvcut -n

Convert excel to csv

in2csv <.xlsx file> <output.csv>

cut csv

csvcut -c 2,5,6 <data.csv> csvcut -c county,item_name,quantity <data.csv>

Pipes

csvcut -c county,item_name,quantity data.csv | csvlook | head

Get basic stats

csvstat -c 2,5,6 <data.csv>

Filter on rows using regex

csvgrep -c user_time_zone -m <data.csv> | csvcut -c 2,5,6 | csvlook | head

Sort (r: reverse-descending)

csvcut -c 2,5,6 <data.csv> | csvsort -c total_cost -r

Combine subsets, stacking

csvstack data1.csv data2.csv > combined_data.csv

SQL

create table

csvsql --db sqlite:///leso.db --insert joined.csv

query

sql2csv --db sqlite:///leso.db --query "select * from joined" sql2csv --db sqlite:///leso.db --query "select * from joined where county='DOUGLAS';" > douglas.csv

create json

csvcut -c county,item_name data.csv | csvgrep -c county -m "GREELEY" | csvjson --indent 4 csvjson --indent 4 --key fips acs2012_5yr_population.csv | head

Specifying STDIN as a file

$ csvstat examples/dummy.csv $ cat examples/dummy.csv | csvstat $ cat examples/dummy.csv | csvstat -

Kill a process

ps aux | grep <process_name e.g. firefox> sudo kill [signal]

Kill multiple

sudo kill pid1 pid2 pid3

#kill all - kill by name killall <process_name e.g. firefox>

#$ cat ~/src/csvkit/examples/dummy.csv | csvstack ~/src/csvkit/examples/dummy3.csv -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment