Skip to content

Instantly share code, notes, and snippets.

@temsa
Created June 1, 2012 15:42
Show Gist options
  • Save temsa/2853052 to your computer and use it in GitHub Desktop.
Save temsa/2853052 to your computer and use it in GitHub Desktop.
How to get introspection of square's "cube" events values for avent type, that you can use in your queries
var isArray = function (v) {
return v && typeof v === 'object' && typeof v.length === 'number' && !(v.propertyIsEnumerable('length'));
}
var isDate = function (v) {
return v && typeof v === 'object' && v instanceof Date;
}
var isObjectId = function (v) {
return v && typeof v === 'object' && v instanceof ObjectId;
}
var cubeInspectSubDoc = function(base, value, time){
for(var key in value) {
var k = key.replace(/\.\d+/g,'[]');
if(k === key) {//not an array
k = "." + k;
}
emit(base + k, time);
if( isArray(value[key]) || (typeof value[key] === 'object' && !isObjectId(value[key]) && !isDate(value[key]) )){
cubeInspectSubDoc(base + k, value[key], time); /*recurses in sub objects */
}
}
}
db.system.js.save( { _id : "isArray", value : isArray } );
db.system.js.save( { _id : "isDate", value : isDate } );
db.system.js.save( { _id : "isObjectId", value : isObjectId } );
db.system.js.save( { _id : "cubeInspectSubDoc", value : cubeInspectSubDoc } );
var cubeInspect = function (col /*e.g.: "collectd_cpu"*/) {
var map = function (){
var value = this.d;
for(var key in value) {
var k = key.replace(/^\d+$/g,'[]');/* fixes how mongo handles array */
emit(k, this.t);
if( isArray(value[key]) || (typeof value[key] === 'object' && !isObjectId(value[key]) && !isDate(value[key]))){
cubeInspectSubDoc(k, value[key], this.t);
}
}
}
var reduce = function (key, times){ /*always returns latest time */
if(times instanceof Date)
return times;
return times.sort(function(a,b){return b-a})[0];
}
/*ensures the "_keys" suffixed collection has been created*/
db.createCollection(col +"_keys");
db[col +"_keys"].ensureIndex({values:1});
/*get latest event date, this could miss values if we add some non recurrent keys before this date, this is the cost of having a quick incremental map/reduce */
var latestEvent = db[col +"_keys"].find().sort({value:-1}).limit(1).toArray();
//print('latest event:', latestEvent, ' / type: ', typeof latestEvent, ' / length:', latestEvent.length);
var latest = new Date( latestEvent.length > 0 ? latestEvent[0].value : 0 );
print('[cubeInspect] start date for incremental map/reduce:', latest);
var mr = db.runCommand({
"mapreduce" : col +"_events"
, "map" : map
, "reduce" : reduce
, "query": {t: {$gt: latest}}
, "out": {merge: col +"_keys"}
});
return db[mr.result];
}
db.system.js.save( { _id : "cubeInspect", value : cubeInspect } );

Start

Load the cubeInspect.js in mongo shell, e.g. by copy/pasting, in the same database as cube (by default, cube_development )

Use

In mongo shell, in the same database as cube, you can now write queries like this :

> use cube_development;
> cubeInspect('cube_request').distinct('_id')
[cubeInspect] start date for incremental map/reduce: Thu Jan 01 1970 01:00:00 GMT+0100 (CET)
[ "ip", "method", "path" ]

Here 'ip', 'method' and 'path' are what you can use in a query for 'cube_request' type

At next call, it will be way quicker to compute those fields :

> cubeInspect('cube_request').distinct('_id')
[cubeInspect] start date for incremental map/reduce: Fri Jun 01 2012 17:35:05 GMT+0200 (CEST)
[ "ip", "method", "path" ]

As you can see, the start date is not the same for the map reduce as it's an incremental map/reduce. It's using the latest known event from the previous map/reduce as a start, so you can miss some fields, but in general it should be a pretty good approximation (if you wanted it to be perfect, you probably couldn't do it in incremental)

Caution

Due to how it works (based on incremental map/reduce), the first call of cubeInspect for a type is pretty slow. Next call for the same type should be quick if you call it often, so it won't have to parse a lot of new events to check nothing has changed.

In order to store the result for it to be incremental, the convention is to use the type suffixed by "_keys", in the same way you should already have type+"_events" and type+"_metrics", e.g. :

  • We want to know what are the usable keys of cube_compute
  • Cube will have already created the collections cube_compute_events and cube_compute_metrics
  • calling the cubeInspect function will create a new cube_compute_keys collection to store the map/reduce results.

There is as much result stored in the _keys collection as there is different kind of key in documents and subdocuments of each ".d" of the _event collection

How does it work ?

CubeInspect uses an incremental map/reduce that you should launch as much as possible if you want the result to be quick at each call. The map/reduce checks all the different keys and subkeys available in each document of the _eventscollection of a type

Thanks

This script has been made as a specific for cube, and is heavily inspired by more generic case detailed in those pages :

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment