Skip to content

Instantly share code, notes, and snippets.

@glynnbird
Last active May 31, 2022 15:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save glynnbird/ad932de13eb0e276e6ffab5fcaa350f8 to your computer and use it in GitHub Desktop.
Save glynnbird/ad932de13eb0e276e6ffab5fcaa350f8 to your computer and use it in GitHub Desktop.
Mango filter syntax

Currently Mango indexes are defined by feeding a list of fields to the /db/_index endpoint.

{
  "fields": ["address.city", "timestamp", "email"],
  "type": "json"
}

Cloudant also allows Lucene-powered indexes to be created in a similar way, this time providing an array of objects defining the attributes to index and their data types:

{
  "fields": [
    {
      "name": "address.city",
      "type": "string",
    },
    {
      "name": "timestamp",
      "type": "number"
    },
    {
      "name": "email",
      "type": "string",
    }
  ]
}

Jan has proposed adding synatic sugar to the index definition so that data can be pre-processed in Erlang-land before being written to the index. MapReduce allows JavaScript to be used for simple data manipulations but Mango has no such mechanism.

What follows is my proposed syntax, as an alternative to the one Jan proposed. I propose we use the "array of objects" syntax, as shown in the text snippet above as it is currently in use in the CouchDB universe and can be easily extended to add optional filters. It should be very easy for a user reading an index definition to understand what's going on.

This is best demonstrated by example.

No pre-processing:

The existing syntax of an array of fields or an array or objects (with name/type) would result in document data being transferred into the index as it is.

{
  "fields": [
    {
      "name": "address.city",
      "type": "string",
    }
  ]
}

Type coercion

The presence of a type indicates to Mango that the data should be coerced to that data type before it arrives in the index:

e.g

{
  "fields": [
    {
      "name": "timestamp",
      "type": "string",
    }
  ]
}

If timestamp it is a number in the document, it will be converted to a string before indexing.

Optional "filter"

A filter function can be selected from a number of boilerplate function names aping JavaScript equivalents:

e.g.

{
  "fields": [
    {
      "name": "address.city",
      "filter": "toLowerCase",
      "type": "string",
    }
  ]
}

Example filters

  • toUpperCase
  • toLowerCase
  • trim
  • trimStart
  • trimEnd
  • Math.floor
  • Math.ceil
  • Math.round
  • Math.abs

Parameterized functions

Not all filter functions take zero parameters, as Jan discussed in his Gist. Other functions would be available that require additional arguements to be passed:

e.g.

{
  "fields": [
    {
      "name": "address.city",
      "filter": "split",
      "arguments": [" "],
      "type": "array",
    }
  ]
}

Filters requiring one or more arguments could be:

  • split
  • join
  • replace
  • replaceAll
  • substring

Dates

CouchDB, unlike other databases has no built-in support for dates but without JavaScript, it's impossible to split an ISO-8601 string into date components prior to indexing. I propose we add some filter functions specifically for this purpose. This is predecated on

{
  "fields": [
    {
      "name": "regigstrationDate",
      "filter": "getFullYear",
      "type": "number",
    }
  ]
}

Date functions ape the JavaScript Date object with all of its idiosyncracies:

  • getFullYear
  • getDay
  • getDate
  • getHours
  • getMinutes
  • getSeconds
  • etc etc

The date functions should be able to cope with document attributes that are ISO-8601 strings or numeric timestamps representing UTC milliseconds since 1970.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment