Skip to content

Instantly share code, notes, and snippets.

@fizz
Last active March 9, 2023 07:40
Show Gist options
  • Save fizz/59513c9e4e5f9f8b615c29c15c25cffb to your computer and use it in GitHub Desktop.
Save fizz/59513c9e4e5f9f8b615c29c15c25cffb to your computer and use it in GitHub Desktop.
convert csv to json the sqlite way

try sqlite3 for csv wrangling! it's a powerhouse for easily turning your csv files into a queryable in-memory database, and it's trivial to turn that into json, and to pipe a sql query's output into jq or into a .db file, whatever you want. here are some aliases I wrote, put them in your dotfiles somewhere:

alias csvq="sqlite3 :memory: -cmd '.mode csv' -cmd '.import /dev/stdin s3' '.mode json'"

You can see that I generically name the table "s3", cuz it's an alias I use with any csv file I'm streaming from an s3 bucket, so the table name doesn't need to be too specific. I use it like this:

aws s3 cp s3://$bucket/$key - | csvq "select * from s3" | jq '.[]' -c

"select * from table" means "give me a bunch of rows", so jq '.[]' -c turns those rows into pretty-printed compressed jsonlines. I also have an alias tsvq for tsv files. It's the same thing pretty much, except with .mode tabs instead of .mode csv alias tsvq="sqlite3 :memory: -cmd '.mode tabs' -cmd '.import /dev/stdin s3' '.mode json'"

tsvq usage example: aws s3 cp s3://"$bucket"/sync/somefile.tsv - | tsvq "select * from s3"| jq '.[]|select(.field=="Value")')

n.b. I snuck a field=="Value" jq filter in there for fun, you could also have that as part of your sqlite query instead of doing it in jq. the point is it's nice to have both syntaxes available so easily! the other way:

aws s3 cp s3://"$bucket"/sync/somefile.tsv - | tsvq "select * from s3 where field='Value'"| jq '.[]' -c)

the most important part of this is the sqlite capability of memory databases, handling stdin or reading csv files directly, and switching from .mode csv for reading and .mode json for output. simplified combo:

sqlite3 :memory: with -cmd '.mode csv' '.import file.csv-or-stdin table' '.mode json' with 'select [fields] from table where [predicate]'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment