lepfhty/dntagging.md

## dntagging.md

      
    Raw
  

              dntagging.md
            
          
    [TOC]
Observation Records

Use Cases

Case 1: Simple Observation

A single observation record should have the following:

a code or ID to identify the type of observation
a human-readable label or description
a measured or observed value
units associated with the measurement

{
  cd: 1,
  desc: "Heart Rate",
  result: 65,
  units: "bpm"
}
Case 2: Multiple Values in a Single Record (e.g., ISM blood pressure)

The field value1 contains Systolic Blood Pressure and value2 contains Diastolic Blood Pressure.
{
  cd: 2,
  desc: "BP",
  value1: 120,
  value2: 90,
  value1units: "mmHg",
  value2units: "mmHg"
}
Case 3: Special Parsing Required (e.g., Cedars base excess)

The string NEG 12 should evaluate to the integer -12
{
  cd: 3,
  desc: "Base Excess - Arterial",
  result: "NEG 12",
  units: "mEq/L"
}
Case 4: Special Parsing for Multiple Values (e.g., Cedars blood pressure)

The string 120/90 should produce the integer 120 for systolic and 90 for diastolic.
{
  cd: 4,
  desc: "Blood Pressure",
  result: "120/90",
  units: "mmHg"
}
Case 5: Observation Metadata within the Record (e.g., ISM temperature site)

A temperature observation has an associated site or route of measurement.
{
  cd: 5,
  desc: "Temperature",
  value1: "oral",
  value2: 37,
  value1units: null,
  value2units: "Celsius"
}
Case 6: Units Value Conversion (e.g., temperature)

The units are given in Fahrenheit but are expected as Celsius.
{
  cd: 6,
  desc: "Temperature",
  result: 98.6,
  units: "Fahrenheit"
}
The units are given as a fraction but are expected as a percentage (integer between 0 and 100).
{
  cd: 7,
  desc: "FiO2",
  result: 0.22,
  units: "_"
}
Case 7: Units String Normalization

The units are listed as MMOL/L but should be mEq/L.  Other examples are mm HG should be mmHg, or ug/kg/hr should be mcg/kg/hr.
{
  cd: 8,
  desc: "Base Excess",
  result: 12,
  units: "MMOL/L"
}
Term Generation

Term Generation is the process of calculating aggregate statistics, called "Terms", from the raw data of observational Records.  Terms are the basic unit of information for the Data Ninja application.
Terms can be generated from a simple configuration file that indicate properties of the Term and values that should be aggregated.
Term Configuration:
{
  collection: // name of the collection of records
  srcid:      // string identifier of the source
  termidkey:  // key of term identifier
  namekey:    // key of human-readable name
  unitskey:   // key of units of measure
  valuekey:   // key of observation values that should be aggregated
}
Let's follow Case 1 through the TermGen process:
// 1. Define a Term Configuration
{
  collection: 'events',
  srcid:      'sitex_events',
  termidkey:  'cd',
  namekey:    'desc',
  unitskey:   'units',
  valuekey:   'result'
}

// 2. Map individual records into a MappedRecord:
{
  srcid: 'sitex_events',
  termid: 1,
  name: 'Heart Rate',
  units: 'bpm',
  value: 65
}

// 3. Reduce (aggregate) individual records into a CountRecord:
{
  srcid: 'sitex_events',
  termid: 1,
  name: 'Heart Rate',
  units: 'bpm',
  value: 65,
  count: 123
}

// 4. Reduce CountRecords into a Term:
{
  srcid: 'sitex_events',
  termid: 1,
  name: 'Heart Rate',
  units: 'bpm',
  values: [60,61,62,63,64,65,...],
  counts: [100,110,115,120,121,123,...]
}

// 5. Some final steps to calculate statistics and save metadata.
Case 2 (and Case 5) follows a similar process, except we require 2 separate runs of the TermGen process.  Even though the resulting Terms from the first and second runs are similar, they will be persisted separately, each with different ObjectIDs.
Extensions

Handling Cases 3, 4, 6, and 7 requires the definition and evaluation of special parsing functions.  We can extend the Term Configuration to include these function definitions as Strings that can be evaluated as Javascript code.
Evaluating Units and Value Functions

Term Configuration:
{
  // same as above
  unitsfunction: // "function(unitsString) { return newUnitsString; }"
  valuefunction: // "function(valueString) { return newValueNumberOrString; }"
}
These functions will be evaluated during Step 2 (map individual records) of the TermGen process.  The argument of the unitsfunction is the Record's unitskey property.  Likewise, the valuefunction argument is the valuekey property.  The remaining steps of TermGen can proceed normally.
Updating a Term Configuration

A typical scenario may play out as follows:

Term Configurations is defined.
Terms are generated.
A Term is viewed and discovered to require special parsing of its values.

At this point, we would like a way to update the TermConfig and re-generate the Term with correctly parsed values.  Thus the following properties of TermConfig may be redefined:

unitskey
unitsfunction
valuekey
valuefunction

When only valuefunction is redefined on TermX, we can simply recompute the histogram values and statistics.  (This may even be done as a "preview" in the browser without ever being persisted on the server.)
When valuekey is redefined on TermX, we can lookup all Records with termidkey property equal to TermX.termid.  Then we run the TermGen process on just these Records to recompute TermX (replacing the former TermX).
When unitskey or unitsfunction is redefined on TermX, the newly generated TermX may collide with an existing Term.  Since Terms are distinct by srcid, termid, name, and units, TermX with new units may no longer be distinct.  Modifying unitskey or unitsfunction should result in merging the recomputed Term with existing Terms.
Duplicate and Modify a Term

In dealing with Case 4, we cannot simply update the valuefunction to generate multiple values (and multiple Terms).  There should be an operation to "Modify a Copy" of an existing Term.  This operation would copy TermX's Config and allow the user to modify both TermX and the copy to generate new Terms.
Extra Metadata on Elements (nice to have)

To facilitate the process of Tagging Records (next section), it would be convenient to attach some extra metadata to each Element.  Perhaps TAG and GROUPS (explained in next section) can be specified as Element metadata.
Tagging Records

Tagging Records is the process of marking individual observation Records with metadata, such that a data-driven application can identify and utilize the Records.
The Tagging process should be able to operate independent of Data Ninja, since Tagging is a requirement between a data application and the Record datastore.  Here, we define a Basic Tagging process and a Data Ninja Tagging process.
Basic Tagging

We want to define a file format to hold tagging information so the Basic Tagging process can be run multiple times using only this file, which we call a "TagMap".  The simplest such format is CSV (and maybe JSON in the future).
The fields of the CSV file are similar to the basic TermConfig.  Here is a TagMap for Cases 1 and 2:
COLLECTION,TERMIDKEY,TERMID,UNITSKEY,VALUEKEY,TAG
events,cd,1,units,result,HR
events,cd,2,value1uom,value1,SBP
events,cd,2,value2uom,value2,DBP

The simplicity of the Basic TagMap allows users to manually create and edit this file using any text editor or spreadsheet editor.
An optional field, GROUPS can be provided to allow an application to assign one or more categories or groupings to each tag.  Multiple groups for a single tag require a field containing pipe-separated groups.
A Basic TagMap with groups:
COLLECTION,TERMIDKEY,TERMID,UNITSKEY,VALUEKEY,TAG,GROUPS
events,cd,5,units,result,HR,Vitals
events,cd,1,value1uom,value1,SBP,BP|Vitals
events,cd,1,value2uom,value2,DBP,BP|Vitals

A tag operation on a single Record will add an array property to the Record containing all tags.  After Basic Tagging, Records from Cases 1 and 2 appear as follows:
// Case 1: Basic Tagged Record
{
  cd: 1,
  desc: "Heart Rate",
  result: 65,
  units: "bpm",
  tags: [{
    units: "bpm",
    value: 65,
    tagvalue: "HR",
    groups: [ "Vitals" ]
  }]
}

// Case 2: Basic Tagged Record
{
  cd: 2,
  desc: "BP",
  value1: 120,
  value2: 90,
  value1units: "mmHg",
  value2units: "mmHg",
  tags: [{
    units: "mmHg",
    value: 120,
    tagvalue: "SBP",
    groups: [ "BP", "Vitals" ]
  },{
    units: "mmHg",
    value: 90,
    tagvalue: "DBP",
    groups: [ "BP", "Vitals" ]
  }]
}
Data Ninja Tagging

Data Ninja allows several extensions to the Basic Tagging process.  These include:

Scoping by the research Dataset
Identifying by the Element name and definition
Applying value functions
Applying units functions

The Data Ninja TagMap has all the data and fields of the Basic TagMap, but adds a few more fields:

UNITSFUNCTION - the unitsfunction of a Term
UNITS - the final evaluated units of a Term
VALUEFUNCTION - the valuefunction of a Term
DATASETID - the Dataset lineageid scope of this mapping
DATASETNAME - (optional) the Dataset name
ELEMENTID - the Element lineageid of this mapping
ELEMENTNAME - (optional) the Element name

The ELEMENTID field in Data Ninja TagMap serves the same purpose as the TAG field in Basic TagMap.  The user may edit the Data Ninja TagMap to add the application-specific tags and groups.
A sample Data Ninja TagMap:


COLLECTION
TERMIDKEY
TERMID
UNITSKEY
VALUEKEY
TAG
GROUPS
UNITSFUNCTION
UNITS
VALUEFUNCTION
DATASETID
DATASETNAME
ELEMENTID
ELEMENTNAME


events
cd
1
units
result
HR
Vitals

bpm

(ObjectID)
VPS
(ObjectID)
Heart Rate


events
cd
2
value1uom
value1
SBP
BP | Vitals

bpm

(ObjectID)
VPS
(ObjectID)
Systolic Blood Pressure


events
cd
2
value2uom
value2
DBP
BP | Vitals

bpm

(ObjectID)
VPS
(ObjectID)
Diastolic Blood Pressure


events
cd
3
units
result
BE
Blood Gases | Labs

mEq/L
function(v){return Number(v.replace(/^NEG /,'-'));}
(ObjectID)
VPS
(ObjectID)
Base Excess


events
cd
4
units
result
SBP
BP | Vitals

bpm
function(v){return Number(v.split('/')[0]);}`
(ObjectID)
VPS
(ObjectID)
Systolic Blood Pressure


events
cd
4
units
result
DBP
BP | Vitals

bpm
function(v){return Number(v.split('/')[1]);}
(ObjectID)
VPS
(ObjectID)
Diastolic Blood Pressure


events
cd
5
value1uom
value1
Temp Route
Vitals


(ObjectID)
VPS
(ObjectID)
Temperature Route


events
cd
5
value2uom
value2
Temp
Vitals

Celsius

(ObjectID)
VPS
(ObjectID)
Temperature


events
cd
6
units
result
Temp
Vitals
function(v){return 'Celcius';}
Celsius
function(v){return (v-32)*5/9;}
(ObjectID)
VPS
(ObjectID)
Temperature


events
cd
7
units
result
FiO2
Vitals
function(v){return '%';}
%
function(v){return v*100;}
(ObjectID)
VPS
(ObjectID)
FiO2


events
cd
8
units
result
BE
Blood Gases | Labs
function(v){return 'mEq/L';}
mEq/L

(ObjectID)
VPS
(ObjectID)
Base Excess


Records tagged with Data Ninja Tagging appear as follows:
// Case 3
{
  cd: 3,
  desc: "Base Excess - Arterial",
  result: "NEG 12",
  units: "mEq/L",
  tags: [{
    units: "mEq/L",
    value: -12,
    tagvalue: "BE"
    groups: [ "Blood Gases", "Labs" ]
    datasetid: <ObjectID>,
    datasetname: "VPS",
    elementid: <ObjectID>,
    elementname: "Base Excess Arterial"
  }]
}

// Case 4
{
  cd: 4,
  desc: "Blood Pressure",
  result: "120/90",
  units: "mmHg",
  tags: [{
    units: "mmHg",
    value: 120,
	tagvalue: "SBP",
    groups: [ "BP", "Vitals" ],
    datasetid: <ObjectID>,
    datasetname: "VPS",
    elementid: <ObjectID>,
    elementname: "Systolic Blood Pressure"
  },{
    units: "mmHg",
    value: 90,
	tagvalue: "DBP",
    groups: [ "BP", "Vitals" ],
    datasetid: <ObjectID>,
    datasetname: "VPS",
    elementid: <ObjectID>,
    elementname: "Diastolic Blood Pressure"
  }]
}

// Case 5 is like Case 2

// Case 6
{
  cd: 6,
  desc: "Temperature",
  result: 98.6,
  units: "Fahrenheit",
  tags: [{
    units: "Celsius",
    value: 37,
    tagvalue: "Temp",
    groups: [ "Vitals" ],
    datasetid: <ObjectID>,
    datasetname: "VPS",
    elementid: <ObjectID>,
    elementname: "Temperature"
  }]
}

// Case 7
{
  cd: 8,
  desc: "Base Excess",
  result: 12,
  units: "MMOL/L",
  tags: [{
    units: "mEq/L",
    value: 12,
    tagvalue: "BE",
    groups: [ "Blood Gases", "Labs" ],
    datasetid: <ObjectID>,
    datasetname: "VPS",
    elementid: <ObjectID>,
    elementname: "Base Excess"
  }]
}
Data Ninja should provide a utility that generates a TagMap from a Dataset.  Each row in the TagMap CSV file should correspond to a single Term.  A Term may be mapped to multiple Elements.  An application needing tags has the following options:

Use the Dataset lineageid and Element lineageid as the tag (recommended).  Groups would have to be implemented within the application.
Manually edit the TagMap CSV, adding TAG and GROUP values to every row.
Use a script to perform option #2.


Written with StackEdit.
COLLECTION	TERMIDKEY	TERMID	UNITSKEY	VALUEKEY	TAG	GROUPS	UNITSFUNCTION	UNITS	VALUEFUNCTION	DATASETID	DATASETNAME	ELEMENTID	ELEMENTNAME
events	cd	1	units	result	HR	Vitals		bpm		(ObjectID)	VPS	(ObjectID)	Heart Rate
events	cd	2	value1uom	value1	SBP	BP \| Vitals		bpm		(ObjectID)	VPS	(ObjectID)	Systolic Blood Pressure
events	cd	2	value2uom	value2	DBP	BP \| Vitals		bpm		(ObjectID)	VPS	(ObjectID)	Diastolic Blood Pressure
events	cd	3	units	result	BE	Blood Gases \| Labs		mEq/L	function(v){return Number(v.replace(/^NEG /,'-'));}	(ObjectID)	VPS	(ObjectID)	Base Excess
events	cd	4	units	result	SBP	BP \| Vitals		bpm	function(v){return Number(v.split('/')[0]);}`	(ObjectID)	VPS	(ObjectID)	Systolic Blood Pressure
events	cd	4	units	result	DBP	BP \| Vitals		bpm	function(v){return Number(v.split('/')[1]);}	(ObjectID)	VPS	(ObjectID)	Diastolic Blood Pressure
events	cd	5	value1uom	value1	Temp Route	Vitals				(ObjectID)	VPS	(ObjectID)	Temperature Route
events	cd	5	value2uom	value2	Temp	Vitals		Celsius		(ObjectID)	VPS	(ObjectID)	Temperature
events	cd	6	units	result	Temp	Vitals	function(v){return 'Celcius';}	Celsius	function(v){return (v-32)*5/9;}	(ObjectID)	VPS	(ObjectID)	Temperature
events	cd	7	units	result	FiO2	Vitals	function(v){return '%';}	%	function(v){return v*100;}	(ObjectID)	VPS	(ObjectID)	FiO2
events	cd	8	units	result	BE	Blood Gases \| Labs	function(v){return 'mEq/L';}	mEq/L		(ObjectID)	VPS	(ObjectID)	Base Excess