Skip to content

Instantly share code, notes, and snippets.

@nolanlawson
Last active August 29, 2015 14:03
Show Gist options
  • Save nolanlawson/10452539f07b12a73d37 to your computer and use it in GitHub Desktop.
Save nolanlawson/10452539f07b12a73d37 to your computer and use it in GitHub Desktop.
Cloudant Query API - first impressions

The new Cloudant query API is pretty awesome. It basically looks like Mongo, which makes for much more readable and user-friendly queries than what you can get with standard map/reduce.

Index the field foo:

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_index -d '{
>     "index": {
>         "fields": ["foo"]
>     },
>     "name" : "foo-index",
>     "type" : "json"
> }'
{"result":"created"}

It creates a design document (sigh), but notice that the structure is different from usual:

cinnabar:~ nolan$ acurl -X GET https://pouch.cloudant.com/mydb/_all_docs?include_docs=true | json
{
  "total_rows": 1,
  "offset": 0,
  "rows": [
    {
      "id": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "key": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "value": {
        "rev": "1-17ec75b65cb1ca7d012cd3fcad323848"
      },
      "doc": {
        "_id": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
        "_rev": "1-17ec75b65cb1ca7d012cd3fcad323848",
        "language": "query",
        "views": {
          "foo-index": {
            "map": {
              "fields": {
                "foo": "asc"
              }
            },
            "reduce": "_count",
            "options": {
              "def": {
                "fields": [
                  "foo"
                ]
              },
              "w": 2
            }
          }
        }
      }
    }
  ]
}

Instead of building up a map function, "map" instead maps to an object describing the index structure. Which is great, because as far as I can tell, it'd be impossible to build a map function that covers all the possible selectors ($in, $nin, $gt, $lt, etc.). What options does, I'm not sure.

Now, let's put a doc with a foo field:

cinnabar:~ nolan$ acurl -X PUT https://pouch.cloudant.com/mydb/doc -d '{"foo": "bar"}'
{"ok":true,"id":"doc","rev":"1-4c6114c65e295552ab1019e2b046b10e"}

Check the list of indexes to make sure it's still there:

cinnabar:~ nolan$ acurl -X GET https://pouch.cloudant.com/mydb/_index | json
{
  "indexes": [
    {
      "ddoc": null,
      "name": "_all_docs",
      "type": "special",
      "def": {
        "fields": [
          {
            "_id": "asc"
          }
        ]
      }
    },
    {
      "ddoc": "_design/a5f4711fc9448864a13c81dc71e660b524d7410c",
      "name": "foo-index",
      "type": "json",
      "def": {
        "fields": [
          {
            "foo": "asc"
          }
        ]
      }
    }
  ]
}

Find all docs where doc.foo is'bar':

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"foo": "bar"}}'| json
{
  "docs": [
    {
      "_id": "doc",
      "_rev": "1-4c6114c65e295552ab1019e2b046b10e",
      "foo": "bar"
    }
  ]
}

Edge case time! What happens if I ask for a field that's not indexed?

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"baz": "bar"}}'| json
{
  "error": "no_usable_index",
  "reason": "No index exists for this selector, try indexing one of: baz"
}

Good choice, Cloudant. By explicitly returning an error when a field isn't indexed, you protect the developer from accidentally writing unperformant queries. No need to do an EXPLAIN or EXPLAIN QUERY PLAN; the database will tell you if you're being a dumbass.

How about the _id field, though? It is technically indexed:

cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"_id": "bar"}}'| json
{
  "docs": []
}
cinnabar:~ nolan$ acurl -X POST https://pouch.cloudant.com/mydb/_find -d '{"selector": {"_id": "doc"}}'| json
{
  "docs": [
    {
      "_id": "doc",
      "_rev": "1-4c6114c65e295552ab1019e2b046b10e",
      "foo": "bar"
    }
  ]
}

Awesome. Developers don't want to have to learn a separate API to query by _id, and it's great that they don't have to. Well done.

Other thoughts:

  • Kinda bizarre that we don't get total_rows or offset when using _find. offset isn't a big loss IMO (we don't even fully implement it for Pouch), but total_rows can be really convenient when doing pagination (e.g. "2 pages of results remaining").
  • Adam Kocoloski said on IRC that they're open to porting this feature to CouchDB, but it won't be in until post-2.0.
  • I'm interested in writing a Pouch plugin for this even before it lands in Couch, but it'll require some modifications to core map/reduce. I've already had to fork map/reduce for the quick-search plugin; not too keen to do it again for this.
  • That being said, this is going to be a beast to implement in PouchDB. Queries like { "selector": {"Actor_name": "Robert De Niro"}, "sort": [{"Movie_year": "asc", "Movie_runtime": "desc"}]} have a lot going on at once. Our toIndexableString idea was pretty clever, but it's not enough to cover every possible combination. In-memory sorting would be unperformant on larger databases, but it may be the only quick and dirty adapter-agnostic solution.
  • I am not a big fan of design documents, but the fact that this allows you to query by _id obviates my biggest complaint about design docs, which is that they muddy the _all_docs results.
  • Cloudant released this in the nick of time. I was worried that Lucene would become the de-facto query language, since it was the only one that allowed you to write complex queries that human beings could actually understand.
  • One view per design doc is a good default.
@nolanlawson
Copy link
Author

Ah, good point. In that case, it might be useful to invent something new for the case where you do a query with limit and want to know how many results are left. It'd be a shame to have to depend on Lucene for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment