glynnbird/mangofilters.md

## mangofilters.md

      
    Raw
  

              mangofilters.md
            
          
    Currently Mango indexes are defined by feeding a list of fields to the /db/_index endpoint.
{
  "fields": ["address.city", "timestamp", "email"],
  "type": "json"
}
Cloudant also allows Lucene-powered indexes to be created in a similar way, this time providing an array of objects defining the attributes to index and their data types:
{
  "fields": [
    {
      "name": "address.city",
      "type": "string",
    },
    {
      "name": "timestamp",
      "type": "number"
    },
    {
      "name": "email",
      "type": "string",
    }
  ]
}
Jan has proposed adding synatic sugar to the index definition so that data can be pre-processed in Erlang-land before being written to the index. MapReduce allows JavaScript to be used for simple data manipulations but Mango has no such mechanism.
What follows is my proposed syntax, as an alternative to the one Jan proposed. I propose we use the "array of objects" syntax, as shown in the text snippet above as it is currently in use in the CouchDB universe and can be easily extended to add optional filters. It should be very easy for a user reading an index definition to understand what's going on.
This is best demonstrated by example.
No pre-processing:

The existing syntax of an array of fields or an array or objects (with name/type) would result in document data being transferred into the index as it is.
{
  "fields": [
    {
      "name": "address.city",
      "type": "string",
    }
  ]
}
Type coercion

The presence of a type indicates to Mango that the data should be coerced to that data type before it arrives in the index:
e.g
{
  "fields": [
    {
      "name": "timestamp",
      "type": "string",
    }
  ]
}
If timestamp it is a number in the document, it will be converted to a string before indexing.
Optional "filter"

A filter function can be selected from a number of boilerplate function names aping JavaScript equivalents:
e.g.
{
  "fields": [
    {
      "name": "address.city",
      "filter": "toLowerCase",
      "type": "string",
    }
  ]
}
Example filters

toUpperCase
toLowerCase
trim
trimStart
trimEnd
Math.floor
Math.ceil
Math.round
Math.abs

Parameterized functions

Not all filter functions take zero parameters, as Jan discussed in his Gist. Other functions would be available that require additional arguements to be passed:
e.g.
{
  "fields": [
    {
      "name": "address.city",
      "filter": "split",
      "arguments": [" "],
      "type": "array",
    }
  ]
}
Filters requiring one or more arguments could be:

split
join
replace
replaceAll
substring

Dates

CouchDB, unlike other databases has no built-in support for dates but without JavaScript, it's impossible to split an ISO-8601 string into date components prior to indexing. I propose we add some filter functions specifically for this purpose. This is predecated on
{
  "fields": [
    {
      "name": "regigstrationDate",
      "filter": "getFullYear",
      "type": "number",
    }
  ]
}
Date functions ape the JavaScript Date object with all of its idiosyncracies:

getFullYear
getDay
getDate
getHours
getMinutes
getSeconds
etc etc

The date functions should be able to cope with document attributes that are ISO-8601 strings or numeric timestamps representing UTC milliseconds since 1970.