Skip to content

Instantly share code, notes, and snippets.

@ewhauser
Created June 7, 2013 05:31
Show Gist options
  • Save ewhauser/5727229 to your computer and use it in GitHub Desktop.
Save ewhauser/5727229 to your computer and use it in GitHub Desktop.
Notes on Facebook's Presto
These notes are from Facebook's Analytics @ Scale conference. I didn't take notes from the presentation or discussions with developers, so feel free to correct any inconsistencies:
- Presto is an ANSI SQL engine built on top of HDFS
- Similar functionality to Cloudera's Impala
- Facebook started developing this project prior to the Impala annoucement, some different design choices
- Implemented in Java
- Queries execute around 10x faster than Hive, aggregation based queries can be 100x times faster
- Byte code generation is used for efficient predicate processing
- Efficient fixed memory data structures are used (very low GC overhead)
- Presto daemons do not have to run on data nodes
- Facebook does run the daemon on the data nodes directly so that they do not saturate the network. Allocates about 8GB of RAM per daemon
- Looking to open source in Fall 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment