sachinlala/CouchBase101.md

## CouchBase101.md

      
    Raw
  

              CouchBase101.md
            
          
    Origin

Couchbase is the merge of two popular NOSQL technologies:

Membase, which provides persistence, replication, sharding to the high performance memcached technology
CouchDB, which pioneers the document oriented model based on JSON

Read path


Key-based lookup mechanism where the client is expected to provide the key, and only the server hosting the data (with that key) will be contacted.
Query mechanism to retrieve data where the client provides a query (for example, range based on some  secondary key) as well as the view (basically the index).  The query will be broadcasted to all servers in the cluster and the result will be merged and sent back to the client.

Write path

A key-based update mechanism where the client sends in an updated document with the key (as doc id).  When handling write request, the server will return to client’s write request as soon as the data is stored in RAM on the active server, which offers the lowest latency for write requests.
Transaction Model


In Couchbase, document is the unit of manipulation i.e. Atomicity is guaranteed at a single document and transactions that span update of multiple documents are unsupported.
In case of modifying documents, client need to retrieve documents by its key, do the modification locally and then send back the whole (modified) document back to the server.  This design tradeoff network bandwidth (since more data will be transferred across the network) for CPU (now CPU load shift to client).
Concurrent Access:

CAS: To provide necessary isolation for concurrent access, Couchbase provides a CAS (compare and swap) mechanism which works as follows:

When the client retrieves a document, a CAS ID (equivalent to a revision number) is attached to it.
While the client is manipulating the retrieved document locally, another client may modify this document.  When this happens, the CAS ID of the document at the server will be incremented.
Now, when the original client submits its modification to the server, it can attach the original  CAS ID in its request.  The server will verify this ID with the actual ID in the server.  If they differ, the document has been updated in between and the server will not apply the update.
The original client will re-read the document (which now has a newer ID) and re-submit its modification.


LOCK: Couchbase also provides a locking mechanism for clients to coordinate their access to documents.  Clients can request a LOCK on the document it intends to modify, update the documents and then releases the LOCK.  To prevent a deadlock situation, each LOCK grant has a timeout so it will automatically be released after a period of time.


Deployment

CouchBase is deployed in a clustered mode. Each node in the cluster contains the following processes:

Data Server: written in C++, this process is responsible to handle the get/set/delete requests from the clients. This is where the in-memory hashtable is also maintained for the JSON documents. What is not found in the Data Server hashtable is routed for a fetch from the file based Document Store (typically CouchDb).
Management Server: written in Erlang, this process is responsible to handle the queries from the clients, manage configuration & also communicate with other nodes in the cluster.
Document Store: CouchDb or SQLLite Db

Virtual Buckets


The basic unit of data storage in Couchbase DB is a JSON document (or primitive data type such as int and byte array) which is associated with a key.  The overall key space is partitioned into 1024 logical storage unit called "virtual buckets" (or vBucket).
vBucket are distributed across machines within the cluster via a map that is shared among servers in the cluster as well as the client library.
High availability is achieved through data replication at the vBucket level.
Under the CouchDB structure, there will be one file per vBucket.


Multiple documents are hence logically grouped under 1 vBucket and multiple vBuckets form a data server.

Load Balancing


Keys are uniformly distributed based on a hash function.
When machines are added and removed in the cluster, the administrator can request a redistribution of vBucket so that data are evenly spread across physical machines.

References


DZone/RickyHo
Getting Started
Installation of Couch Server w/ Docker

Boot Up


docker run -t --name db -p 8091-8094:8091-8094 -p 11210:11210 couchbase/sandbox:5.0.0-beta