[TOC]
A single observation record should have the following:
- a code or ID to identify the type of observation
- a human-readable label or description
- a measured or observed value
- units associated with the measurement
{
cd: 1,
desc: "Heart Rate",
result: 65,
units: "bpm"
}
The field value1
contains Systolic Blood Pressure and value2
contains Diastolic Blood Pressure.
{
cd: 2,
desc: "BP",
value1: 120,
value2: 90,
value1units: "mmHg",
value2units: "mmHg"
}
The string NEG 12
should evaluate to the integer -12
{
cd: 3,
desc: "Base Excess - Arterial",
result: "NEG 12",
units: "mEq/L"
}
The string 120/90
should produce the integer 120
for systolic and 90
for diastolic.
{
cd: 4,
desc: "Blood Pressure",
result: "120/90",
units: "mmHg"
}
A temperature observation has an associated site or route of measurement.
{
cd: 5,
desc: "Temperature",
value1: "oral",
value2: 37,
value1units: null,
value2units: "Celsius"
}
The units are given in Fahrenheit but are expected as Celsius.
{
cd: 6,
desc: "Temperature",
result: 98.6,
units: "Fahrenheit"
}
The units are given as a fraction but are expected as a percentage (integer between 0 and 100).
{
cd: 7,
desc: "FiO2",
result: 0.22,
units: "_"
}
The units are listed as MMOL/L
but should be mEq/L
. Other examples are mm HG
should be mmHg
, or ug/kg/hr
should be mcg/kg/hr
.
{
cd: 8,
desc: "Base Excess",
result: 12,
units: "MMOL/L"
}
Term Generation is the process of calculating aggregate statistics, called "Terms", from the raw data of observational Records. Terms are the basic unit of information for the Data Ninja application.
Terms can be generated from a simple configuration file that indicate properties of the Term and values that should be aggregated.
Term Configuration:
{
collection: // name of the collection of records
srcid: // string identifier of the source
termidkey: // key of term identifier
namekey: // key of human-readable name
unitskey: // key of units of measure
valuekey: // key of observation values that should be aggregated
}
Let's follow Case 1 through the TermGen process:
// 1. Define a Term Configuration
{
collection: 'events',
srcid: 'sitex_events',
termidkey: 'cd',
namekey: 'desc',
unitskey: 'units',
valuekey: 'result'
}
// 2. Map individual records into a MappedRecord:
{
srcid: 'sitex_events',
termid: 1,
name: 'Heart Rate',
units: 'bpm',
value: 65
}
// 3. Reduce (aggregate) individual records into a CountRecord:
{
srcid: 'sitex_events',
termid: 1,
name: 'Heart Rate',
units: 'bpm',
value: 65,
count: 123
}
// 4. Reduce CountRecords into a Term:
{
srcid: 'sitex_events',
termid: 1,
name: 'Heart Rate',
units: 'bpm',
values: [60,61,62,63,64,65,...],
counts: [100,110,115,120,121,123,...]
}
// 5. Some final steps to calculate statistics and save metadata.
Case 2 (and Case 5) follows a similar process, except we require 2 separate runs of the TermGen process. Even though the resulting Terms from the first and second runs are similar, they will be persisted separately, each with different ObjectIDs.
Handling Cases 3, 4, 6, and 7 requires the definition and evaluation of special parsing functions. We can extend the Term Configuration to include these function definitions as Strings that can be evaluated as Javascript code.
Term Configuration:
{
// same as above
unitsfunction: // "function(unitsString) { return newUnitsString; }"
valuefunction: // "function(valueString) { return newValueNumberOrString; }"
}
These functions will be evaluated during Step 2 (map individual records) of the TermGen process. The argument of the unitsfunction
is the Record's unitskey
property. Likewise, the valuefunction
argument is the valuekey
property. The remaining steps of TermGen can proceed normally.
A typical scenario may play out as follows:
- Term Configurations is defined.
- Terms are generated.
- A Term is viewed and discovered to require special parsing of its values.
At this point, we would like a way to update the TermConfig and re-generate the Term with correctly parsed values. Thus the following properties of TermConfig may be redefined:
- unitskey
- unitsfunction
- valuekey
- valuefunction
When only valuefunction
is redefined on TermX, we can simply recompute the histogram values and statistics. (This may even be done as a "preview" in the browser without ever being persisted on the server.)
When valuekey
is redefined on TermX, we can lookup all Records with termidkey
property equal to TermX.termid. Then we run the TermGen process on just these Records to recompute TermX (replacing the former TermX).
When unitskey
or unitsfunction
is redefined on TermX, the newly generated TermX may collide with an existing Term. Since Terms are distinct by srcid
, termid
, name
, and units
, TermX with new units may no longer be distinct. Modifying unitskey
or unitsfunction
should result in merging the recomputed Term with existing Terms.
In dealing with Case 4, we cannot simply update the valuefunction
to generate multiple values (and multiple Terms). There should be an operation to "Modify a Copy" of an existing Term. This operation would copy TermX's Config and allow the user to modify both TermX and the copy to generate new Terms.
To facilitate the process of Tagging Records (next section), it would be convenient to attach some extra metadata to each Element. Perhaps TAG
and GROUPS
(explained in next section) can be specified as Element metadata.
Tagging Records is the process of marking individual observation Records with metadata, such that a data-driven application can identify and utilize the Records.
The Tagging process should be able to operate independent of Data Ninja, since Tagging is a requirement between a data application and the Record datastore. Here, we define a Basic Tagging process and a Data Ninja Tagging process.
We want to define a file format to hold tagging information so the Basic Tagging process can be run multiple times using only this file, which we call a "TagMap". The simplest such format is CSV (and maybe JSON in the future).
The fields of the CSV file are similar to the basic TermConfig. Here is a TagMap for Cases 1 and 2:
COLLECTION,TERMIDKEY,TERMID,UNITSKEY,VALUEKEY,TAG
events,cd,1,units,result,HR
events,cd,2,value1uom,value1,SBP
events,cd,2,value2uom,value2,DBP
The simplicity of the Basic TagMap allows users to manually create and edit this file using any text editor or spreadsheet editor.
An optional field, GROUPS
can be provided to allow an application to assign one or more categories or groupings to each tag. Multiple groups for a single tag require a field containing pipe-separated groups.
A Basic TagMap with groups:
COLLECTION,TERMIDKEY,TERMID,UNITSKEY,VALUEKEY,TAG,GROUPS
events,cd,5,units,result,HR,Vitals
events,cd,1,value1uom,value1,SBP,BP|Vitals
events,cd,1,value2uom,value2,DBP,BP|Vitals
A tag operation on a single Record will add an array property to the Record containing all tags. After Basic Tagging, Records from Cases 1 and 2 appear as follows:
// Case 1: Basic Tagged Record
{
cd: 1,
desc: "Heart Rate",
result: 65,
units: "bpm",
tags: [{
units: "bpm",
value: 65,
tagvalue: "HR",
groups: [ "Vitals" ]
}]
}
// Case 2: Basic Tagged Record
{
cd: 2,
desc: "BP",
value1: 120,
value2: 90,
value1units: "mmHg",
value2units: "mmHg",
tags: [{
units: "mmHg",
value: 120,
tagvalue: "SBP",
groups: [ "BP", "Vitals" ]
},{
units: "mmHg",
value: 90,
tagvalue: "DBP",
groups: [ "BP", "Vitals" ]
}]
}
Data Ninja allows several extensions to the Basic Tagging process. These include:
- Scoping by the research Dataset
- Identifying by the Element name and definition
- Applying value functions
- Applying units functions
The Data Ninja TagMap has all the data and fields of the Basic TagMap, but adds a few more fields:
UNITSFUNCTION
- theunitsfunction
of a TermUNITS
- the final evaluatedunits
of a TermVALUEFUNCTION
- thevaluefunction
of a TermDATASETID
- the Dataset lineageid scope of this mappingDATASETNAME
- (optional) the Dataset nameELEMENTID
- the Element lineageid of this mappingELEMENTNAME
- (optional) the Element name
The ELEMENTID
field in Data Ninja TagMap serves the same purpose as the TAG
field in Basic TagMap. The user may edit the Data Ninja TagMap to add the application-specific tags and groups.
A sample Data Ninja TagMap:
COLLECTION | TERMIDKEY | TERMID | UNITSKEY | VALUEKEY | TAG | GROUPS | UNITSFUNCTION | UNITS | VALUEFUNCTION | DATASETID | DATASETNAME | ELEMENTID | ELEMENTNAME |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
events | cd | 1 | units | result | HR | Vitals | bpm | (ObjectID) | VPS | (ObjectID) | Heart Rate | ||
events | cd | 2 | value1uom | value1 | SBP | BP | Vitals | bpm | (ObjectID) | VPS | (ObjectID) | Systolic Blood Pressure | ||
events | cd | 2 | value2uom | value2 | DBP | BP | Vitals | bpm | (ObjectID) | VPS | (ObjectID) | Diastolic Blood Pressure | ||
events | cd | 3 | units | result | BE | Blood Gases | Labs | mEq/L | function(v){return Number(v.replace(/^NEG /,'-'));} | (ObjectID) | VPS | (ObjectID) | Base Excess | |
events | cd | 4 | units | result | SBP | BP | Vitals | bpm | function(v){return Number(v.split('/')[0]);}` | (ObjectID) | VPS | (ObjectID) | Systolic Blood Pressure | |
events | cd | 4 | units | result | DBP | BP | Vitals | bpm | function(v){return Number(v.split('/')[1]);} | (ObjectID) | VPS | (ObjectID) | Diastolic Blood Pressure | |
events | cd | 5 | value1uom | value1 | Temp Route | Vitals | (ObjectID) | VPS | (ObjectID) | Temperature Route | |||
events | cd | 5 | value2uom | value2 | Temp | Vitals | Celsius | (ObjectID) | VPS | (ObjectID) | Temperature | ||
events | cd | 6 | units | result | Temp | Vitals | function(v){return 'Celcius';} | Celsius | function(v){return (v-32)*5/9;} | (ObjectID) | VPS | (ObjectID) | Temperature |
events | cd | 7 | units | result | FiO2 | Vitals | function(v){return '%';} | % | function(v){return v*100;} | (ObjectID) | VPS | (ObjectID) | FiO2 |
events | cd | 8 | units | result | BE | Blood Gases | Labs | function(v){return 'mEq/L';} | mEq/L | (ObjectID) | VPS | (ObjectID) | Base Excess |
Records tagged with Data Ninja Tagging appear as follows:
// Case 3
{
cd: 3,
desc: "Base Excess - Arterial",
result: "NEG 12",
units: "mEq/L",
tags: [{
units: "mEq/L",
value: -12,
tagvalue: "BE"
groups: [ "Blood Gases", "Labs" ]
datasetid: <ObjectID>,
datasetname: "VPS",
elementid: <ObjectID>,
elementname: "Base Excess Arterial"
}]
}
// Case 4
{
cd: 4,
desc: "Blood Pressure",
result: "120/90",
units: "mmHg",
tags: [{
units: "mmHg",
value: 120,
tagvalue: "SBP",
groups: [ "BP", "Vitals" ],
datasetid: <ObjectID>,
datasetname: "VPS",
elementid: <ObjectID>,
elementname: "Systolic Blood Pressure"
},{
units: "mmHg",
value: 90,
tagvalue: "DBP",
groups: [ "BP", "Vitals" ],
datasetid: <ObjectID>,
datasetname: "VPS",
elementid: <ObjectID>,
elementname: "Diastolic Blood Pressure"
}]
}
// Case 5 is like Case 2
// Case 6
{
cd: 6,
desc: "Temperature",
result: 98.6,
units: "Fahrenheit",
tags: [{
units: "Celsius",
value: 37,
tagvalue: "Temp",
groups: [ "Vitals" ],
datasetid: <ObjectID>,
datasetname: "VPS",
elementid: <ObjectID>,
elementname: "Temperature"
}]
}
// Case 7
{
cd: 8,
desc: "Base Excess",
result: 12,
units: "MMOL/L",
tags: [{
units: "mEq/L",
value: 12,
tagvalue: "BE",
groups: [ "Blood Gases", "Labs" ],
datasetid: <ObjectID>,
datasetname: "VPS",
elementid: <ObjectID>,
elementname: "Base Excess"
}]
}
Data Ninja should provide a utility that generates a TagMap from a Dataset. Each row in the TagMap CSV file should correspond to a single Term. A Term may be mapped to multiple Elements. An application needing tags has the following options:
- Use the Dataset lineageid and Element lineageid as the tag (recommended). Groups would have to be implemented within the application.
- Manually edit the TagMap CSV, adding
TAG
andGROUP
values to every row. - Use a script to perform option #2.
Written with StackEdit.