jhollingworth/profiles.md

## profiles.md

      
    Raw
  

              profiles.md
            
          
    #Why are profiles fast?
There's a lot of work that goes into determining what profiles you're in. People often ask if having a large number of profiles on the page will negatively affect performance. When designing profiles we're accutely aware of this problem and so have implemented a number of optimisations to ensure we're performant at scale.
To create a profile, you define a profile specification. Profile specifications are a lisp-esque langauge that allows you to define a predicate tree.
[
  "and",
  [
    {
      "key": {
        "event": "ec.View",
        "field": {
          "path": "meta.appName",
          "type": "string"
        }
      },
      "op": "eq",
      "value": "Chrome"
    },
    {
      "key": {
        "event": "ec.Product",
        "field": {
          "path": "category",
          "type": "string"
        }
      },
      "op": "eq",
      "value": "shoe"
    }
  ]
]

We could have given this profile specifications directly to the browser. However we realised that were a number of optimisation steps we could perform which would make it faster to execute this profile specification. So we decided to introduced a compile step to profiles, the result of which we call the profile index. The profile index is a JSON object that gets included into smartserve.js and represents all profiles for a property (Example profile index):
{
  "events": {
    "ec.View": [
      "702834841",
      "-1318588253",
      "-1611364477"
    ],
    "ec.Product": [
      "702834841",
      "-1318588253",
      "-1611364477"
    ]
  },
  "profiles": {
    "PR-2185-SJ334": "-1611364477"
  },
  "criteria": {
    "702834841": {
      "id": "702834841",
      "key": {
        "event": "ec.View",
        "field": {
          "path": "meta.appName",
          "type": "string"
        }
      },
      "op": "eq",
      "value": "Chrome"
    },
    "-1318588253": {
      "id": "-1318588253",
      "key": {
        "event": "ec.Product",
        "field": {
          "path": "category",
          "type": "string"
        }
      },
      "op": "eq",
      "value": "shoe"
    },
    "-1611364477": {
      "id": "-1611364477",
      "op": "and",
      "dependencies": [
        "702834841",
        "-1318588253"
      ],
      "props": {}
    }
  },
  "criteriaCount": 3
}

So what happens in this compile step?
The first thing we do is work out the identity for each predicate. Taking a leaf out of git's book, we convert each predicate to a string (JSON.stringify) and then get the hash code pf that string. When a predicate has dependencies we simply join the hashes of dependencies into one long string and then get the hash of that.
Once we have id's we can now flatten out the predicate tree. This means we can do fast predicate lookups rather than traversing the tree. criteria contains the flattened tree. Flattening the tree creates a new problem though, theres no way of knowing the root of the tree. profiles solves this by saying what the root criteria for each profile is.
Now that we can reference predicates and easily look them up, we need to work out which predicates to execute and in what order. It turns out this is a rather tricky graph problem which we can fortunately solve at compile time. The result of this computation is the events hash which describes for each event which predicates should be executed in what order.
Pre-computing all of these things means that we do the minimum amount of computation in the browser. If you look at what the membership engine actually does when an event is processed, you realise all its doing is a few lookups from a hash (O(1)) and then simple logical computations.
This is the crux of why profiles are fast.
##Other optimisations
###Long Ids
A problem we found during development was these hash code Ids can become pretty long. When persisting data against these predicates we found the vast majority of memory was taking up by Ids. The only constraints on the Id was it was a string and you could deterministically compute it. To solve this we introduced a further compilation step that would create an encoded version of the Id. To do this encoding we have a table of long Ids to short Ids. If your long Id isn't in the table, we count the number of rows in the table (for that property) and then base 64 encode that number. Implementing this optimisation reduced our storage size by 500%.
###AND/OR with one child
In profile manager we intentionally produce unoptimised profile specifications (It makes it easier to build them). This means we end up with lots of and's and or's with just one element in them. These functions are useless and so we remove them from the profile, replacing them instead with the inner criteria.