This is a blog post about the internals of the Couchbase view engine index updater.
The updater is triggered whenever a view query is performed and there is data available that isn't already indexed. Just imagine that you've inserted some data and now query a view with stale=false
.
The has an internal pipeline to process the documents/updates. This way one part can catch up whenever another part slows down.
There are two queues, the "MapQueue" an the "WriterQueue". A queue can be "closed" so that other consumers of the queue know that there won't be any new items added to that queue.
The MapQueue contains the documents that were inserted/updated since the last time the updater ran.
In Couchbase the documents are spread across several so-called partitions. Each such partition has its own sequence number. The updater stores the highest sequence number for every partition it has already indexed . This way it's easy to find out whether an update it needed or not. The updater loops through all partitions and requests all the updates that happened since the last time the updater ran.
These documents are stored in the MapQueue. When all updates are stored in the queue it gets closed.
The next step is to run the map part of the mapreduce. The items of the MapQueue get dequeued one by one. Every map function of the design document is then applied to every item. The result for every document is then stored in the WriterQueue.
When all items are processed, the WriterQueue gets closed as well.
The WriterQueue contains one item per updated document (if there were no errors). The value of each item is a list of result from every map function of the design document.
When the WriterQueue reaches a certain (byte) size, then all items get dequeued and are written to disk (together with the additional step of doing the reduce part of the mapreduce in case there is any). The same happens when the end of the queue is reached (when it got closed).
Loading documents -> MapQueue -> Apply map function -> WriterQueue -> Reduce + store on disk. i
In case a query with stale=false
triggered the updater, the response will be sent once WriterQueue is closed and everything was written to disk.