bryaakov/flatten.json.md

## flatten.json.md

      
    Raw
  

              flatten.json.md
            
          
    Flatten JSON

// TODO: Rename feature. Suggestion: Dynamic Nested Indexing
Abstract


CM-Well will allow indexing not only by Linked Data predicates, or plain field names (AKA meta/nn), but also by nested JSON attributes.
Those attributes will be supplied in a JSON format (see APIs below), and will be searchable using a "path" in the JSON structure (see Examples below).

APIs


Indexing an Infoton with nested attributes: Upload a FileInfoton to the path of your document, with "application/x-cm-well-json" content type header.
Searching by nested attributes: "...&qp=[NestedPath].json[FieldOperator][value]"

Example

Assumming "cm-well" is a cluster name.
Data ingest:
$ curl -X POST cm-well/_in?format=ntriples -H "X-CM-WELL-TOKEN:<WriterToken>" --data-binary '<http://example.org/zebra1> <http://example.org/zebra-ns/name> "Zebra" .'
{"success":true"}
Dynamically indexing:
$ curl -X POST cm-well/example.org/zebra1 -H "X-CM-WELL-TYPE:File" -H "Content-Type:application/x-cm-well-json" -H "X-CM-WELL-TOKEN:<WriterToken>" --data-binary '{
  "pattern": {
    "stripes": {
      "black": [1,3,5,7],
      "white": [0,2,4,6]
    },
    "tail": true
  }
}'
{"success":true"}
Searching by dynamic nested fields:
$ curl "cm-well/example.org?op=stream&recursive&qp=pattern.strips.black.json>5&format=ntriples"
<http://example.org/zebra1> <http://example.org/zebra-ns/name> "Zebra" .
$
$ curl "cm-well/example.org?op=stream&recursive&qp=pattern.tail.json::true&format=ntriples"
<http://example.org/zebra1> <http://example.org/zebra-ns/name> "Zebra" .
Implementation


All searchable APIs (search,stream,consume,etc.) will support the ".json" virtual namespace. If supplied, a new FieldNameParser will be used, passing the "first name" (i.e. json path) it as-is to FTS, (FTS will also support such method).
For uploading a JSON FileInfoton - if the content type is not application/json but application/x-cm-well-json, FTS will index the json as-is.

Documentation


We should make it crystal clear that using this feature assumes the schema is bounded, or otherwise Elasticsearch will blow up.