Skip to content

Instantly share code, notes, and snippets.

@ianmcook
Created November 7, 2023 20:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ianmcook/c1ab0d1d30d51d3bb110a7ed5de6bd79 to your computer and use it in GitHub Desktop.
Save ianmcook/c1ab0d1d30d51d3bb110a7ed5de6bd79 to your computer and use it in GitHub Desktop.
Write and read Parquet files, combine columns together into an Arrow table, and check if order was preserved
import pyarrow as pa
import pyarrow.parquet as pq
import random
import string
# write parquet files
original = []
for i in range(3):
data = [[random.uniform(0, 1) for _ in range(1000000)]]
original.extend(data)
table = pa.table(data, names=string.ascii_letters[i])
pq.write_table(table, str(i) + '.pq')
# read parquet files
columns = []
names = []
for i in range(3):
table = pq.read_table(str(i) + '.pq')
columns.extend(table.columns)
names.extend(table.schema.names)
# combine all columns into one table
table = pa.table(columns, names=names)
# check if order was preserved
for i in range(3):
original[i] == table.columns[i].to_pylist()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment