Skip to content

Instantly share code, notes, and snippets.

@7yl4r
Last active March 14, 2023 17:56
Show Gist options
  • Save 7yl4r/7d5a9e238171f58ebaf07a9b33769716 to your computer and use it in GitHub Desktop.
Save 7yl4r/7d5a9e238171f58ebaf07a9b33769716 to your computer and use it in GitHub Desktop.
MBON data pipeline schema

The goal is to create maximally-human-usable documentation of datasets, tools, indicators, etc relevant to MBON researchers.

Each entity (eg dataset) would have a directory with a README.md and optional other files. Files to submit to wikidata could also be generated.

Ideally:

  • each entity (eg datset) has a homepage
  • structured metadata is searchable
  • API to access the data
  • programmatically ingest data

initial plans

  1. create github issue template
  2. issue triggers a github action via hook to create README file
  3. multi-part form? keep it really simple at first:
    • title
    • description
    • point of contact
    • collapsable subsections in the GH form?

open questions

is GHA and GHI the best toolset here or should we use other form-to-trigger stuff?

Can gForms trigger code (g apps scripts?)?

At what point should we just build a new application?

todo

  1. create new repo under marinebon org
  2. add info to readme
    • be clear that this is minimal entry point for datasets
  3. create issue template
  4. create GHA triggered by issue submission that can
    • create directory & README.md
    • create files to submit to:
      • GOOS BIO-ECO portal
      • wikidata
      • MBON Portal
      • OBIS?
        • OBIS MBON datasets should set Mat as "distributor"
# Modify this code to update the DB schema diagram.
# To reset the sample schema, replace everything with
# two dots ('..' - without quotes).
DatasetType(SpatioTemporal) as dt-1
-
TypeID PK string
Description string
DatasetType(TaxaOccurrence) as dt-2
-
TypeID PK string
Description string
Dataset_SST
-
DatasetID PK int
DatasetTypeID FK >- dt-1.TypeID
Name string INDEX
ERDDAP_link url
docs_link doi
Dataset_Chlor_a
-
DatasetID PK int
DatasetTypeID FK >- dt-1.TypeID
Name string INDEX
ERDDAP_link url
docs_link doi
Dataset_OBIS_query as d_obis
-
DatasetID PK int
DatasetTypeID FK >- dt-2.TypeID
Name string INDEX
docs_link doi
DerivedIndicator_Algae_SDM as ind_1
-
IndicatorID PK int
DatasetTypeID FK >- dt-1.TypeID
DatasetIDs_1 int FK >- Dataset_SST.DatasetID
DatasetIDs_2 FK >- Dataset_Chlor_a.DatasetID
DatasetIDs_3 FK >- d_obis.DatasetID
ViewFramework_MBON_Portal as ol
----
ViewFrameworkID PK int
ProductID int FK >- p.ProductID
# Table documentation comment 1 (try the PDF/RTF export)
DataView_algae_dataview as p # Table documentation comment 2
------------
ProductID PK int
DataID_1 int FK >- ind_1.IndicatorID
DataID_2 int FK >- d_obis
# Field documentation comment 2
Name varchar(200) UNIQUE # Field documentation comment 3
# Modify this code to update the DB schema diagram.
# To reset the sample schema, replace everything with
# two dots ('..' - without quotes).
DatasetType
-
TypeID PK string
Description string
Dataset
-
DatasetID PK int
DatasetTypeID FK >- DatasetType.TypeID
Name string INDEX
ERDDAP_link url
docs_link doi
DerivedIndicator
-
IndicatorID PK int
DatasetIDs int[] FK >- Dataset.DatasetID
TotalAmount money
ViewFramework as ol
----
ViewFrameworkID PK int
ProductID int FK >- p.ProductID
# Table documentation comment 1 (try the PDF/RTF export)
DataView as p # Table documentation comment 2
------------
ProductID PK int
IndicatorID int FK >- DerivedIndicator.IndicatorID
# Field documentation comment 2
Name varchar(200) UNIQUE # Field documentation comment 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment