Created
June 13, 2018 05:33
-
-
Save bits01/5d2d67212e3576b855f36d073876a563 to your computer and use it in GitHub Desktop.
Convert Parquet file to gzipped JSON lines (JSONL) in 3 lines of code
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# pip install pyarrow | |
# pip install pandas | |
import pyarrow.parquet as pq | |
# columns=['col1', 'col2'] to restrict loaded columns | |
pds = pq.read_pandas('/path/to/file.parquet', columns=None, nthreads=4).to_pandas() | |
# path_or_buf='output.jsonl.gz' to output to a file instead of stdout | |
print pds.to_json(path_or_buf=None, orient='records', lines=True, date_format='iso', date_unit='us', compression='gzip') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment