Skip to content

Instantly share code, notes, and snippets.

@VincentRoma
Created November 9, 2016 12:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save VincentRoma/74a1800f72c8e602cb497c11e73ff339 to your computer and use it in GitHub Desktop.
Save VincentRoma/74a1800f72c8e602cb497c11e73ff339 to your computer and use it in GitHub Desktop.

HBase

HBase and Accumulo are both implementations of BigTable using HDFS as their storage mechanism. They are virtually identical in most regards from both an operational and architectual standpoint; they have different APIs, and use different nomenclature, but conceptually they do the same thing and performance is close enough that there's no significant advantage to using one over the other.

HBase has a much larger user base,  better support, better integration with the rest of the Hadoop ecosystem, and a better set of peripheral tools.

Accumulo's point of differentiation is security.   It offers fine-grained access controls, down to the field level, that HBase lacks, plus many other security-related enhancements.   That's really it's only advantage, but if you are dealing with sensitive data, not having that level of security can be a showstopper.

HBase versus Cassandra versus Accumulo

Clones of Google's Bigtable

Historically, both HBase and Cassandra have a lot in common. HBase was created in 2007 at Powerset (later acquired by Microsoft) and was initially part of Hadoop and then became a Top-Level-Project. Cassandra originated at Facebook in 2007, was open sourced and then incubated at Apache, and is nowadays also a Top-Level-Project. Both HBase and Cassandra are wide-column key-value datastores that excel at ingesting and serving huge volumes of data while being horizontally scalable, robust and providing elasticity.

There are philosophical differences in the architectures: Cassandra borrows many design elements from Amazon's DynamoDB system, has an eventual consistency model and is write-optimized while HBase is a Google BigTable clone with read-optimization and strong consistency. An interesting proof point for the superiority of HBase is the fact that Facebook, the creator of Cassandra, replaced Cassandra with HBase for their internal use.

According to proponents, Accumulo is capable of maintaining consistency even as it scales to thousands of nodes and petabytes of data; it can both read and write data in near real-time; and, most importantly, it was built from the ground up with cell-level security functionality.

It’s the third feature – cell-level security – that has the Big Data community most excited. Accumulo is being positioned as an all-purpose Hadoop database and a competitor to HBase. While HBase, like Accumulo, is able to scale to thousands of machines while maintaining a relatively high level of consistency, it was not designed with any security, let alone cell-level security, in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment