chb0github/metrics.md

## metrics.md

      
    Raw
  

              metrics.md
            
          
    UI Design

The ability to Choose from a multitude of related entities by domain space.
For example:

In the last mile space routes will consist of drivers and specific packages
In middle mile and long haul, specific packages are not nearly as important as capacity utilization
In sortation, knowing when you had a bad sort or unsortable is what counts as well as who did it. In this model, a route is tangential


To support this, the service must provide a catalog of data schemas representing each of the domain spaces. eg.

GET /schemas/ would return all schemata and can be presented as a pull down menu
GET /schemas/sortation would return a JSON-schema defining the data payload presented for sortation data. This data
can be presented as a tree view of selectable fields. Related fields could also be selected and joined apppropriately.
for example, a reference to the last-mile schema could be include and joined.
GET /schemas/last-mile would return a JSON-schema defining the data payload presented for last mile pertinent data.
The examples go on. This data can be selected and aggregated using a complex event processor which would allow
for joins and aggregations by windows (time or count).
Service design


Schema can designed in code and discovered implemented and supplied as a classic resource.

A code based design means that the function of the system will never differ from schema.
A classic resource will be much more accessible to business developer designing but could be out-of-sync with functionality. Read: doesn't require a developer.


Creating queries

Once a selection of fields from various related schema are selected, space in a query-specific data store is created.


For example, if the query does severalaggregations, then a column family in Cassandra could be created specifically to house data related to this query.
If none-aggregate or flat data is required, a classic RDBMS table can be created with only the relevant fields


A "query" is formed from the selections in the query and is persisted and supplied to the processing engin and configured to output data for this query to the query specific
data store from above. The dat is both transformative and selective

select a.foo, b.bar, c.baz as BIGC from Alpha a, Beta b, Charlie c

As data passes into the system it is matched against existing queries and emits data to the pre-defined data store.
Queries have expiration dates and may be renewed so as to limit resource consumption for forgotten-about queries.

Collectively, these could be thought of as something akin to a bento-box; tidy and complete and just for you. Perhaps a "q-box"
Service interaction

Any instance of a service may take any "q-box" definition, persist the query, create the data store and begin listening on all available data channels.
Other instances can be made aware of the new "q-box" either through a distributed cache such as Hazelcast or a queue-topic exchange
arrangement.
As q-box definitions are, by their nature, straightforward, any distributed data store can work for persistence

How to actually get data:
Live data

All new services must register their schema at

POST /schemas/:subtype


For safety sake, the system should discard messages for which it does not have a conforming schema registered.


Then, the Create and Update of entities should be pushlished into a stream of data (webhooks and/or queue) the the CEP engine can match against queries.The use of queues makes it naturally scalable.
However, depending on the solution chosen this may be unnecessary.


Historical Data
A dedicated output for all streams must be to archive data. Conjuring up data then becomes a matter of select * from EntityArchive where timestamp > '2016-09-22T18:21:21Z'
and then stream it through the CEP processor. The archive source should be cheap and unstructured because queries will not be performed against it directly.
A simple key-value store would be enough. However, bet hedging might be desired and since all data will almost certainly be JSON, then elastic can serve nicely.
On demand data
Though this is supportable, querying terabytes of data on the spot adhoc (with no prior indexing or performance enhancing) should be discouraged.
This would require keeping on hand capacity for processing or archiving to redshift with it's subsequent costs. A thorough analysis of solutions should be undertaken
If this is desired then archive stored should be a graph database as this offers the best performance for relational searching across sparse data.
Systems effected


Elbrox - joining alarms with any entity alarmed upon
Planner - Because it's so important
worldview - update to package/vehicle status. When drivers are added to vehicles
AntFarm - Not much: It's meant to be more of a query store than a business decider
Earp - Definitely: What is getting delivered to where and to whom.
Planseeker/worldseeker/jobseeker data stores may need to be re-thought
Matrix -- will need to offer extensive schema discovery and query building tools via drag-and-drop. Extensive work here
SomeNewService - To accept/offer schema for CEP processing and configuration
Lockbox - no effect
cromag - Questionable.

This is a system wide necessity and will need on a per system basis.
For references:
Microsoft Stream Insight
Video of Stream insight
Amazon Kenesis + Data Pipelines
Esper For just the CEP.