Skip to content

Instantly share code, notes, and snippets.

@brijesh-deb
Last active June 12, 2018 18:05
Show Gist options
  • Save brijesh-deb/d122b664acc1271ecbd90f079b9b7f31 to your computer and use it in GitHub Desktop.
Save brijesh-deb/d122b664acc1271ecbd90f079b9b7f31 to your computer and use it in GitHub Desktop.
#NoSQL #Notes
  • Challenges with traditional database
    • Not a good fit for large data volume(petabyte of data) with varying data types(images, video, text etc.)
    • Cann't scale for large data volume
    • Scale up is limited by memory and processing (CPU) capabilities
    • Scale out is difficult to achieve
    • Sharding causes operational problems; eg- managing shard failure
    • Consistency is a bottleneck for scalability in RDBMS
  • In case of NoSQL consistency is relaxed.

Types of NoSQL

  • Key/Value store: Redis, Couchbase
  • Column store
  • Graph store
  • Document store
  • Multi-model databases

CAP Theorem

  • CAP theorem states that there are 3 basic requirements for distributed architecture
    • Consistency: Data in database should remain consistent after execution of an operation. For example after an update operation all clients should see the same data.
    • Availability: System is always on; no downtime.
    • Partition Tolerance: System continues to funtion even if communication among servers is unreliable.The system continues to function and upholds its consistency guarantees in spite of network partitions.
  • Theoratically it is not possible to meet all 3 requirements.
  • Following combinations are possible
    • CA: Single site cluster, all nodes are always in contact. Example: All RDBMS
    • CP: A category of systems where availability is sacrificed in the case of a network partition. Example: MongoDB, Redis, HBase
    • AP: Systems that are available and partition tolerant but cannot guarantee consistency. Example: Cassandra, DynamoDB, CouchDB

NoSQL to rescue

  • Features of NoSQL
    • Scale-out, shared-nothing architecture, capable of running on a number of nodes.
    • Non-locking concurrency control mechanism so real-time reads will not conflict with writes
    • Scaled across thousands of nodes with distributed data
    • Schema less data model
    • Mostly query and few updates

BASE, not ACID

  • Basically Available, Soft State, Eventual Consistency
  • Basically available indicates that the system does guarantee availability
  • Soft State means that the system state may change over time, even without input. This is because of eventual consistency.
  • Eventual Consistency means the system will eventually become consistent, given that it doesn't receive input during that time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment