Skip to content

Instantly share code, notes, and snippets.

@mbstacy
Last active May 20, 2020 16:07
Show Gist options
  • Save mbstacy/8f01346aead00219773c1ec3d99a76c5 to your computer and use it in GitHub Desktop.
Save mbstacy/8f01346aead00219773c1ec3d99a76c5 to your computer and use it in GitHub Desktop.

Data Lake API

The CU Boulder Library uses the Cybercommons framework to access the data lake API. The backend database uses Mongo Database as the data document store. The query and aggregation pipeline leverages the MongoDB Query language. The language is passed through URL parameters using the JSON format.

Basic Parameters

- default format is a web presentation with clickable links 
  ?format=json
  ?format=xml
  ?format=yaml

Query Language

The URL API parameters are based on the MongoDB Python client module.

  • Select all data from location.name == Norlin Library

      https://libapps.colorado.edu/api/data_store/data/datalake/gatecount/?query={"filter":{"location.name":"Norlin Library"}}
    
  • Select all data from location.name == Norlin Library in descending order

      https://libapps.colorado.edu/api/data_store/data/datalake/gatecount/?query={"filter":{"location.name":"Norlin Library"},"sort":[["localDateTime.local_timestamp",-1]]} 
    
  • Select all data for current fiscal year to present descending order

      https://libapps.colorado.edu/api/data_store/data/datalake/gatecount/?query={"filter":{"$and":[{"localDateTime.local_timestamp":{"$gt":"2019-07"}},{"localDateTime.local_timestamp":{"$lt":"2020-07"}}]},"sort":[["localDateTime.local_timestamp",1]]}
    

Distinct Query

  • Select distinct values - location.name

      https://libapps.colorado.edu/api/data_store/data/datalake/gatecount/?distinct=location.name
    
  • Select multiple distinct values - location.name, site.name, sensor.name

      https://libapps.colorado.edu/api/data_store/data/datalake/gatecount/?distinct=location.name,site.name,sensor.name
    

Aggregation Pipeline

The aggregation pipeline is a list a stages([,,]).

  • Query location.name = Norlin Library; Aggregate by Year,Month,day; Sort descending

      Aggregation:
          [
          {"$match":{"location.name":"Norlin Library"}},
          {"$group":{"_id":{"location":"$location.name","year":"$localDateTime.year","month":"$localDateTime.month","day":"$localDateTime.day"},
                      "record_count":{"$sum":1},"ins":{"$sum":"$sumins"},"outs":{"$sum":"$sumouts"}}},
          {"$sort":{"_id.year":-1,"_id.month":-1,"_id.day":-1}}
          ]
    
    
      Link: https://libapps.colorado.edu/api/data_store/data/datalake/gatecount/?aggregate=[{"$match":{"location.name":"Norlin Library"}},{"$group":{"_id":{"location":"$location.name","year":"$localDateTime.year","month":"$localDateTime.month","day":"$localDateTime.day"},"record_count":{"$sum":1},"ins":{"$sum":"$sumins"},"outs":{"$sum":"$sumouts"}}},{"$sort":{"_id.year":-1,"_id.month":-1,"_id.day":-1}}]
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment