Skip to content

Instantly share code, notes, and snippets.

@janl
Created August 21, 2013 10:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save janl/6292622 to your computer and use it in GitHub Desktop.
Save janl/6292622 to your computer and use it in GitHub Desktop.
What’s new in Apache CouchDB 0.11 — Part Two: Views; JOINs Redux, Raw Collation for Speed
Hi again! It’s Jan again. Thanks for coming back. If you missed Part One, here’s your chance to catch up.
CouchDB JOINs Redux
When I started out talking about CouchDB (back in 2006) people were rarely aware of any databases that didn’t use SQL for querying. An frequent question was “How do I do JOINs?” — The short answer is “You don’t”. People worried about retrieving “related data” from a non-relational database.
Turns out “related data” and “relation” have very little in common (the first is groups of data, the second is a mathematical term that refers to a multivalued mapping more commonly called a “table” ). Long story short there was (and still is) confusion.
Of course CouchDB lets you retrieve related data in any shape or form you like. Christopher Lenz did a great write-up on “CouchDB ‘JOINs’” dating as far back as 2007, it is still very applicable.
Since then, though, CouchDB gained a few new features to tackle the same problem: fetch related data. These aren’t new in 0.11, but they did get refined, so it makes sense to revisit them here. Since 0.10, you could query a view with the query parameter include_docs=true. When specified, CouchDB would fetch, for each row in the view result, the corresponding document from the database. This allows users to make a trade-off between smaller view indexes (and hence shorter view index times) and slower view index (for each row, CouchDB makes a single request to the database).
With 0.11, you can include a _id member in the value of the view result and have CouchDB fetch a document with another id than the one that produced the view row.
As an example, consider these four documents:
{
"_id": "Claire",
"title": "VP of Official Attitude"
}
{
"_id": "Mikeal",
"title": "VP of Pastries and Automating Stuff"
}
{
"_id": "Jason",
"title": "VP of Hosting and Lightning"
}
{
"_id": "team",
"members": ["Claire", "Mikeal", "Jason"]
}
And this map function:
function(doc) {
if(doc.members) {
doc.members.forEach(function(member) {
emit(member, {_id: member});
});
}
}
The regular result looks like this:
{
total_rows: 4,
offset: 0,
rows: [
{"key":"Claire", "value":{"_id":"Claire"}},
{"key":"Jason", "value":{"_id":"Jason"}},
{"key":"Mikeal", "value":{"_id":"Mikeal"}}
]
}
If you query the view with include_docs=true, the result looks like this:
{
total_rows: 4,
offset: 0,
rows: [
{
"key":"Claire",
"value":{"_id":"Claire"},
"doc": {"_id":"Claire","title":"VP of Official Attitude"}
},
{
"key":"Jason",
"value":{"_id":"Jason"},
"doc": {"_id":"Jason","title":"VP of Hosting and Lightning"}
},
{
"key":"Mikeal",
"value":{"_id":"Mikeal"},
"doc": {"_id":"Mikeal","title":"VP of Pastries and Automating Stuff"}
}
]
}
Pretty slick, don’t you think?
Raw Collation
This one is a quickie for speed freaks.
By default all views are sorted in a locale-dependent unicode collation order. This ensures that languages get sorted naturally instead of an artificial byte-order collation.
This is great, but sometimes, you don’t need unicode-aware sorting. CouchDB 0.11 allows you to specify a view definition option to enable raw collation for a view.
{
"_id": "_design/app",
"views": {
"faster": {
"map": "function(doc) {emit(doc.field, 1);}",
"options": {
"collation": "raw"
}
}
}
}
Views that are built with this option avoid calling out to the ICU (IBM Components For Unicode) driver to sort all rows. Hence the speed-up. How much faster depends on your data and hardware, but the difference can be significant.
If you feel like it, create a small benchmark, publish the numbers on your blog and let us know! We’ll post a follow-up and compare everybody’s results.
Next up in our series are the new features of the CouchDB Replicator, stay tuned!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment