Skip to content

Instantly share code, notes, and snippets.

@sachinlala
Last active May 10, 2018 10:34
Show Gist options
  • Save sachinlala/6afec5c72b47dba9ef05c26374709cd3 to your computer and use it in GitHub Desktop.
Save sachinlala/6afec5c72b47dba9ef05c26374709cd3 to your computer and use it in GitHub Desktop.
Notes about CouchBase

Origin

Couchbase is the merge of two popular NOSQL technologies:

  • Membase, which provides persistence, replication, sharding to the high performance memcached technology
  • CouchDB, which pioneers the document oriented model based on JSON

Read path

  1. Key-based lookup mechanism where the client is expected to provide the key, and only the server hosting the data (with that key) will be contacted.
  2. Query mechanism to retrieve data where the client provides a query (for example, range based on some secondary key) as well as the view (basically the index). The query will be broadcasted to all servers in the cluster and the result will be merged and sent back to the client.

Write path

A key-based update mechanism where the client sends in an updated document with the key (as doc id). When handling write request, the server will return to client’s write request as soon as the data is stored in RAM on the active server, which offers the lowest latency for write requests.

Transaction Model

  1. In Couchbase, document is the unit of manipulation i.e. Atomicity is guaranteed at a single document and transactions that span update of multiple documents are unsupported.
  2. In case of modifying documents, client need to retrieve documents by its key, do the modification locally and then send back the whole (modified) document back to the server. This design tradeoff network bandwidth (since more data will be transferred across the network) for CPU (now CPU load shift to client).
  3. Concurrent Access:
    1. CAS: To provide necessary isolation for concurrent access, Couchbase provides a CAS (compare and swap) mechanism which works as follows:
      1. When the client retrieves a document, a CAS ID (equivalent to a revision number) is attached to it.
      2. While the client is manipulating the retrieved document locally, another client may modify this document. When this happens, the CAS ID of the document at the server will be incremented.
      3. Now, when the original client submits its modification to the server, it can attach the original CAS ID in its request. The server will verify this ID with the actual ID in the server. If they differ, the document has been updated in between and the server will not apply the update.
      4. The original client will re-read the document (which now has a newer ID) and re-submit its modification.
    2. LOCK: Couchbase also provides a locking mechanism for clients to coordinate their access to documents. Clients can request a LOCK on the document it intends to modify, update the documents and then releases the LOCK. To prevent a deadlock situation, each LOCK grant has a timeout so it will automatically be released after a period of time.

Deployment

CouchBase is deployed in a clustered mode. Each node in the cluster contains the following processes:

  1. Data Server: written in C++, this process is responsible to handle the get/set/delete requests from the clients. This is where the in-memory hashtable is also maintained for the JSON documents. What is not found in the Data Server hashtable is routed for a fetch from the file based Document Store (typically CouchDb).
  2. Management Server: written in Erlang, this process is responsible to handle the queries from the clients, manage configuration & also communicate with other nodes in the cluster.
  3. Document Store: CouchDb or SQLLite Db

Virtual Buckets

  1. The basic unit of data storage in Couchbase DB is a JSON document (or primitive data type such as int and byte array) which is associated with a key. The overall key space is partitioned into 1024 logical storage unit called "virtual buckets" (or vBucket).
  2. vBucket are distributed across machines within the cluster via a map that is shared among servers in the cluster as well as the client library.
  3. High availability is achieved through data replication at the vBucket level.
  4. Under the CouchDB structure, there will be one file per vBucket.

Multiple documents are hence logically grouped under 1 vBucket and multiple vBuckets form a data server.

Load Balancing

  1. Keys are uniformly distributed based on a hash function.
  2. When machines are added and removed in the cluster, the administrator can request a redistribution of vBucket so that data are evenly spread across physical machines.

References

Boot Up

docker run -t --name db -p 8091-8094:8091-8094 -p 11210:11210 couchbase/sandbox:5.0.0-beta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment