Skip to content

Instantly share code, notes, and snippets.

@vmx
Last active December 22, 2015 05:29
Show Gist options
  • Save vmx/6424351 to your computer and use it in GitHub Desktop.
Save vmx/6424351 to your computer and use it in GitHub Desktop.
Couchbase view engine index updater

Couchbase view engine index updater

This is a blog post about the internals of the Couchbase view engine index updater.

The index updater is triggered

The updater is triggered whenever a view query is performed and there is data available that isn't already indexed. Just imagine that you've inserted some data and now query a view with stale=false.

Updater pipeline

The has an internal pipeline to process the documents/updates. This way one part can catch up whenever another part slows down.

There are two queues, the "MapQueue" an the "WriterQueue". A queue can be "closed" so that other consumers of the queue know that there won't be any new items added to that queue.

MapQueue

The MapQueue contains the documents that were inserted/updated since the last time the updater ran.

In Couchbase the documents are spread across several so-called partitions. Each such partition has its own sequence number. The updater stores the highest sequence number for every partition it has already indexed . This way it's easy to find out whether an update it needed or not. The updater loops through all partitions and requests all the updates that happened since the last time the updater ran.

These documents are stored in the MapQueue. When all updates are stored in the queue it gets closed.

The next step is to run the map part of the mapreduce. The items of the MapQueue get dequeued one by one. Every map function of the design document is then applied to every item. The result for every document is then stored in the WriterQueue.

When all items are processed, the WriterQueue gets closed as well.

WriterQueue

The WriterQueue contains one item per updated document (if there were no errors). The value of each item is a list of result from every map function of the design document.

When the WriterQueue reaches a certain (byte) size, then all items get dequeued and are written to disk (together with the additional step of doing the reduce part of the mapreduce in case there is any). The same happens when the end of the queue is reached (when it got closed).

The full pipeline

Loading documents -> MapQueue -> Apply map function -> WriterQueue -> Reduce + store on disk. i

Returning the result

In case a query with stale=false triggered the updater, the response will be sent once WriterQueue is closed and everything was written to disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment