Skip to content

Instantly share code, notes, and snippets.

@nbrendler
Created March 21, 2018 16:46
Show Gist options
  • Save nbrendler/fbe794ae1082dfeb11720ea2e93eb05e to your computer and use it in GitHub Desktop.
Save nbrendler/fbe794ae1082dfeb11720ea2e93eb05e to your computer and use it in GitHub Desktop.
CSV to JSONL script using pandas & click
#!/usr/bin/env python
import csv
import json
import sys
csv.field_size_limit(sys.maxsize)
import click
import pandas as pd
stdin = click.get_binary_stream('stdin')
stdout = click.get_text_stream('stdout')
@click.command()
@click.option('--chunksize', default=100)
@click.argument('csv_file', type=click.File('rbU'))
def csv2jsonl(csv_file, chunksize):
for chunk in pd.read_csv(csv_file, chunksize=chunksize):
chunk.to_json(stdout, orient='records', lines=True)
if __name__=='__main__':
csv2jsonl()
@nbrendler
Copy link
Author

Drop it somewhere on your path and you can run either

csv2jsonl my_csv.csv

or

csv2jsonl - < my_csv.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment