public
Last active

CouchDB and multiple tags

  • Download Gist
gistfile1.md
Markdown

CouchDB and multiple tags

The problem

Given a set of documents, each associated with multiple tags, how can I retrieve the documents tagged with an arbitrary set of tags?

My solution

In order to simplify the problem I have assumed that the array of tags in each document is ordered and that the key to search for documents will be an ordered array of tags as well.

Create the db:

$ DB="http://localhost:5984/db"
$ curl -XPUT $DB

Create some documents with tags:

$ curl -XPOST $DB -d '{"_id":"One", "tags":["A", "B", "C"]}' -H "Content-Type: application/json"
$ curl -XPOST $DB -d '{"_id":"Two", "tags":["B", "C", "D"]}' -H "Content-Type: application/json"
$ curl -XPOST $DB -d '{"_id":"Six", "tags":["C", "D"]}' -H "Content-Type: application/json"

Now create the view:

$ echo '
{
  "_id":"_design/all",
  "language":"javascript",
  "views":{
    "by_tags":{
      "map":"function(doc) {\n
          if (doc.tags == []) return;\n\n

          var emit_sequence = function(base, disp) {\n
            if (disp.length > 1) {\n
              emit(base.concat(disp[0]), 1);\n
              emit_sequence(base.concat(disp[0]), disp.slice(1, disp.length));\n
              emit_sequence(base, disp.slice(1, disp.length));\n
            } else if (disp.length == 1) {\n
              emit(base.concat(disp[0]), 1);\n
            }\n
          }\n\n

          emit_sequence([], doc.tags);\n
        }"
    }
  }
}
' > view.json

$ curl -XPOST $DB -d @view.json -H "Content-Type: application/json"

This view use a recursive function so that will emit all the possible combination of tags required to identify the document with any of the possible ordered subset of tags.

So, if the only document "One" was inserted, the list of keys will be:

$ curl -XGET $DB/_design/all/_view/by_tags

{"total_rows":7,"offset":0,"rows":[
{"id":"One","key":["A"],"value":1},
{"id":"One","key":["A","B"],"value":1},
{"id":"One","key":["A","B","C"],"value":1},
{"id":"One","key":["A","C"],"value":1},
{"id":"One","key":["B"],"value":1},
{"id":"One","key":["B","C"],"value":1},
{"id":"One","key":["C"],"value":1}
]}

With the three given sample documents, if we want now retrieve the list of document tagged with "B" and "C", the request will be:

$ curl -XGET $DB/_design/all/_view/by_tags?key='\["B","C"\]'

{"total_rows":17,"offset":6,"rows":[
{"id":"One","key":["B","C"],"value":1},
{"id":"Two","key":["B","C"],"value":1}
]}

Or, if we are looking for all the documents with tag "C":

$ curl -XGET $DB/_design/all/_view/by_tags?key='\["C"\]'

{"total_rows":17,"offset":10,"rows":[
{"id":"One","key":["C"],"value":1},
{"id":"Six","key":["C"],"value":1},
{"id":"Two","key":["C"],"value":1}
]}

It is worth to note that the view computed in this way will be quite big, the number of rows for each document is related to the number of tags. Given t as the tags size for a document, the rows count can be computed with the following formula:

rows = 2^t - 1

I am sure this is not the most efficient way to solve the original problem, but it works for me. Thanks to Claudio for his help.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.