Skip to content

Instantly share code, notes, and snippets.

@rand
Last active April 14, 2018 06:24
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rand/9cf85961d14fd49a5193 to your computer and use it in GitHub Desktop.
Save rand/9cf85961d14fd49a5193 to your computer and use it in GitHub Desktop.

Streaming Systems:

Streaming More Generally:

Data More Generally:

Join Semi-Lattices:

Metadata:

Columnar Storage + Data Frames + In Memory

  • Arrow - Powering Columnar In-Memory Analytics.
  • Parquet - Columnar storage format available to any project in the Hadoop ecosystem.
  • Alluxio - FKA Tachyon. Memory-centric virtual distributed storage system.
  • Filo - Fast, memory-efficient, minimal-serialization, binary data vectors.
  • Feather - Fast, interoperable binary data frame storage for Python, R, etc powered by Apache Arrow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment