Skip to content

Instantly share code, notes, and snippets.

@ExpandingMan
Created March 19, 2019 00:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ExpandingMan/4ef3cadab6f3e6d65e672a32b821654f to your computer and use it in GitHub Desktop.
Save ExpandingMan/4ef3cadab6f3e6d65e672a32b821654f to your computer and use it in GitHub Desktop.
writing some arrow test data
import pyarrow as pa
v = pa.array([1,2,3,4])
batch = pa.RecordBatch.from_arrays([v], ["this_is_the_column_name"])
sink = pa.BufferOutputStream()
writer = pa.RecordBatchStreamWriter(sink, batch.schema)
writer.write_batch(batch)
writer.close()
buf = sink.getvalue()
b = buf.to_pybytes() # this is the buffer containing the full streaming format
# schema_buffer = batch.schema.serialize().to_pybytes()
f = open("testdata1.dat", "wb")
f.write(b)
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment