Skip to content

Instantly share code, notes, and snippets.

@adrianp
Last active December 15, 2015 20:19
Show Gist options
  • Save adrianp/5318138 to your computer and use it in GitHub Desktop.
Save adrianp/5318138 to your computer and use it in GitHub Desktop.
Lean list of various BigData/NoSQL related projects
This work-in-progress summarizes the way-too-many BigData(tm) technologies.
This is by no means an in-depth description, but a very short summary so that
I know where to look.
1. Databases:
* DynamoDB - aws.amazon.com/dynamodb/ - Amazon AWS integration, MapReduce
* MongoDB - mongodb.org/ - JSON-style document database, SQL-like queries + MapReduce
* Riak - basho.com/riak/ - Key-Value storage, MapReduce
* CouchDB - couchdb.apache.org/ - JSON document storage, JavaScript Queries + MapReduce
* Redis - redis.io/ - Key-Value storage, Pub/Sub messaging
* HBase - hbase.apache.org/ - Bigtable-like capabilities on top of Hadoop and HDFS
* Cassandra - cassandra.apache.org/ - BigTable-like, SQL-like queries + MapReduce
* Hypertable - hypertable.org/ - Bigtable-like, SQL-like queries + MapReduce, strong commercial support
* Accumulo - accumulo.apache.org/ - Key-Value storage, Bigtable+Hadoop+HDFS
* Neo4j - neo4j.org/ - Graph database
* Couchbase - couchbase.com/ - Document-oriented, querying + MapReduce
* VoltDB - voltdb.com/ - OLTP/real-time processing database by Stonebraker, proprietary
* scalaris - code.google.com/p/scalaris/ - Key-Value storage
* Voldemort - project-voldemort.com/ - Key-Value storage, used at LinkedIn
* MemcacheDB - memcachedb.org/ - Key-Value storage based on Memcached
* VelocityDB - velocitydb.com/ - Object and Graph DB, Key-Value support
* ElephantDB - github.com/nathanmarz/elephantdb/ - Database specialized on exporting key-valuedata from Hadoop
Questions: Why does Apache have so many identical projects?
2. Data analysis:
* elasticsearch - elasticsearch.org/ - Distributed RESTful search and analytics on top of Lucene, Memchaced, JSON
* Hadoop + HDFS - hadoop.apache.org/ - MapReduce implementation
* Hive - hive.apache.org/ - Data warehouse over Hadoop
* Mahoot - mahout.apache.org/ - Scalable ML
* Pig - pig.apache.org/ - Uses Pig Latin to produce sequences of MapReduce jobs (for Hadoop)
* D3.js - d3js.org/ - JavaScript library for visualizing data
* R - r-project.org/ - Statistics
* Julia - julialang.org/ - Potential replacement for R
* Drill - incubator.apache.org/drill/ - Big data analysis based on Google Dremel
* Gremlin - github.com/tinkerpop/gremlin/ - Graph analysis
* Giraph - giraph.apache.org/ - Graph analysis
* InfiniteGraph - objectivity.com/infinitegraph/ - Graph analysis, commercial
* Golden Orb - goldenorbos.org/ - Graph analysis using Google Pregel on top of Hadoop
* JethroData - jethrodata.com/ - Data analysis on top of Hadoop, commercial
* Spark - spark-project.org/- Projects that aims to extend/improve Hadoop, move beyond MapReduce
* HStreaming - hstreaming.com/ - Real time and batch processing workflow over Hadoop and HDFS, commercial
3. Real time processing:
* DBToaster - dbtoaster.org/ - Creates processing engines from SQL queries
* Storm - storm-project.net/ - MapReduce over real time data
* Trident - engineering.twitter.com/2012/08/trident-high-level-abstraction-for.html/ - Elegant abstraction for defining Storm topologies
* Squall - github.com/epfldata/squall/ - SQL over Storm
* SAP Hana - http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx/ - In-memory DB and stream processing, commercial
* Esper - esper.codehaus.org/ - CEP, Java and .NET, commercial
4. Infrastructure
* ZooKeeper - zookeeper.apache.org/ - Distributed coordination
* ZeroMQ - zeromq.org/ - Message transport layer
* RabbitMQ - rabbitmq.com/ - Message transport layer
* Kafka - kafka.apache.org/ - Publish/Subscribe messaging system
* S4 - incubator.apache.org/s4/ - Real time processing infrastructure
* Kestrel - github.com/robey/kestrel/ - Message transport layer
* Ganglia - ganglia.sourceforge.net/ - Monitoring
* OpenStack - openstack.org/ - Open source software for building clouds
* Cloud Foundry - cloudfoundry.com/ - Deployment solution
5. Resources:
* Database comparison - http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis/
* More comprehensive NoSQL list - http://nosql-database.org/
* Big Data Right Now: Five Trendy Open Source Technologies (10.2012) - http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/?goback=%2Egde_4332669_member_225815227/
* SQL is what’s next for Hadoop: Here’s who’s doing it (01.2013) - http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/
* Wikipedia, ofc: http://en.wikipedia.org/wiki/NoSQL
* Nathan Marz (Storm developer) on beating the CAP theorem (as this is controversial, make sure to read the comments also): http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment