Skip to content

Instantly share code, notes, and snippets.

@mewwts
Created May 22, 2017 11:19
Show Gist options
  • Save mewwts/427745aaf476cb7621a5439b59e90a21 to your computer and use it in GitHub Desktop.
Save mewwts/427745aaf476cb7621a5439b59e90a21 to your computer and use it in GitHub Desktop.
Going from Python data structure to binary avro representation and from avro to dict.
from io import BytesIO
from avro.io import DatumReader
from avro.datafile import DataFileReader
blob = ...
schema = ...
byte_stream = BytesIO(blob)
reader = DataFileReader(byte_stream, DatumReader(schema))
value = [item for item in reader][0]
# value is now a Python dictionary representation of the avro object.
from io import BytesIO
from avro.io import DatumWriter
from avro.datafile import DataFileWriter
data = ... # data is whatever you want to encode
schema = ...
byte_stream = BytesIO()
writer = DatumWriter()
file_writer = DataFileWriter(byte_stream, writer, schema)
file_writer.append(data)
file_writer.flush()
binary_data = byte_stream.getvalue()
file_writer.close()
# binary_data is now an avro object which you can ship to e.g. Kafka
@asinitson
Copy link

This is really useful! Thanks!

This is the part I was missing file_writer.flush(): without it when you do byte_stream.getvalue() byte stream is empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment