Skip to content

Instantly share code, notes, and snippets.

@h3xagn
Created May 14, 2022 20:40
Show Gist options
  • Save h3xagn/52ce536710314d0005c2271061646b3a to your computer and use it in GitHub Desktop.
Save h3xagn/52ce536710314d0005c2271061646b3a to your computer and use it in GitHub Desktop.
Build ETL from device to cloud: https://h3xagn.com
// query data in raw table
// test_data/raw_oil_data.csv
raw_oil_data
// total records 102
// specifically added some bad data:
// - first three records are duplicated
// - third record contains an empty TagName
// - fourth record is below zero (-1.25)
// - fifth record is above 10 000 (10 254)
// lets look at our dimension tables
// test_data/dimSiteEqmt.csv
dimSiteEqmt
// 56 records
// test_data/dimSystemTag.csv
dimSystemTag
// 5 records
// lets create our factOilData table as a variable
// note that it is only available within this query
let factOilData =
raw_oil_data
| summarize arg_max(TimeStamp, *) by TimeStamp, TagName
| project-away TimeStamp1
| where Value between (0..10000)
| where isnotempty(TagName)
| parse TagName with Site: string '.' System: string '.' EqmtIp: string '.' Tag: string;
// join our two dimension tables using lookup and their keys
// lookup is similar to join, however it is optimised for larger fact tables joining smaller dimension tables
factOilData
| lookup kind=leftouter dimSiteEqmt on Site, EqmtIp
| lookup kind=leftouter dimSystemTag on System, Tag
// our factOilData now has 97 records (5 bad data records removed)
// we also parsed the TagName to generate keys Site, System, EqmtIp and Tag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment