Skip to content

Instantly share code, notes, and snippets.

@kjschiroo
Created June 7, 2019 03:58
Show Gist options
  • Save kjschiroo/9a28dd075175249376176f1c01c39f33 to your computer and use it in GitHub Desktop.
Save kjschiroo/9a28dd075175249376176f1c01c39f33 to your computer and use it in GitHub Desktop.
Writing a parquet file with pyarrow
import pyarrow as pa
import pyarrow.parquet as pq
column_1 = pa.array([1, 4, 8], pa.int32())
column_2 = pa.array([True, False, True], pa.bool_())
data = [column_1, column_2]
names = ['my_int_column', 'my_bool_column']
batch = pa.RecordBatch.from_arrays(data, names)
table = pa.Table.from_batches([batch])
pq.write_table(table, 'my-parquet-file.parquet', flavor='spark')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment