Skip to content

Instantly share code, notes, and snippets.

@ahoy-jon
Last active August 29, 2015 14:17
Show Gist options
  • Save ahoy-jon/c4a67dbd917ccc694f1e to your computer and use it in GitHub Desktop.
Save ahoy-jon/c4a67dbd917ccc694f1e to your computer and use it in GitHub Desktop.
Avro better than X

For info, I gave a talk about it : http://www.slideshare.net/jwinandy/data-encoding-and-metadata-for-streams/17

a few points :

  • a reference to a schema is 64 bits (with hashing) or 32 bits if you use a coordination store (like Kafka + Camus does).

    • It's not a real waste of space, because you can use this reference for multiple payloads.
  • field renaming is well supported. In Avro you read your data with not one, but 2 schemas :

    • the one that was used to encode the data with (easy, it's around the data as metadata),
    • and the one you want to use to read your data.

    So you can have a common read schema (thanks to union and renaming) for several write schemas.

  • One of the great feature of Avro is the genericity. You don't have to generate code to parse a message, so you can build an smart intermediary, like smart hadoop jobs that do generic stuffs : https://github.com/viadeo/viadeo-avro-utils

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment