Skip to content

Instantly share code, notes, and snippets.

@dwinston
dwinston / nmdc-schema-v3.2.0-v6.0.3.src.diff
Created Aug 12, 2022
<github.com:microbiomedata/nmdc-schema>: git diff v3.2.0 v6.0.3 -- src/schema/*
View nmdc-schema-v3.2.0-v6.0.3.src.diff
diff --git a/src/schema/basic_slots.yaml b/src/schema/basic_slots.yaml
index 17c8bf3..1fa4bda 100644
--- a/src/schema/basic_slots.yaml
+++ b/src/schema/basic_slots.yaml
@@ -29,7 +29,7 @@ slots:
description: >-
A unique identifier for a thing.
Must be either a CURIE shorthand for a URI or a complete URI
- #required: false # for now we setting this to false until we develop an id template
+ #required: false # for now we are setting this to false until we develop an id template
@dwinston
dwinston / nmdc-schema-v3.2.0-v6.0.3.json.diff
Last active Aug 12, 2022
<github.com:microbiomedata/nmdc-schema>: git diff v3.2.0 v6.0.3 -- jsonschema/nmdc.schema.json
View nmdc-schema-v3.2.0-v6.0.3.json.diff
diff --git a/jsonschema/nmdc.schema.json b/jsonschema/nmdc.schema.json
index 3a18b59..1efa788 100644
--- a/jsonschema/nmdc.schema.json
+++ b/jsonschema/nmdc.schema.json
@@ -53,6 +53,28 @@
"title": "Agent",
"type": "object"
},
+ "AnalysisTypeEnum": {
+ "description": "",
View Taxonomy_Flattening_Search_with_ml_jeffjames.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View emsl_fix.csv
id action attribute value
emsl:456424 update processing_institution Environmental Molecular Sciences Laboratory
@dwinston
dwinston / nmdc_envo_term_subterms.py
Created Apr 15, 2022
build a subsumption map scoped to ENVO terms in use by NMDC biosamples
View nmdc_envo_term_subterms.py
"""
Build a subsumption map scoped to ENVO terms in use by NMDC biosamples
"""
from collections import defaultdict
import json
from rdflib import Graph
from rdflib.namespace import Namespace
from tqdm import tqdm
View fake_biosample.json
{"biosample_set": [{
"id": "fake3",
"env_broad_scale" : {
"term" : {"id": "ENVO:01000253"}
},
"env_local_scale" : {
"term" : {"id": "ENVO:01000621"}
},
"env_medium" : {
"term" : {"id": "ENVO:01000017"}
View fake_biosample.json
{"biosample_set": [{
"id": "fake2",
"env_broad_scale" : {
"term" : {"id": "ENVO:01000253"}
},
"env_local_scale" : {
"term" : {"id": "ENVO:01000621"}
},
"env_medium" : {
"term" : {"id": "ENVO:01000017"}
View file_type_enum.jsonl
{
"name" : "FT ICR-MS analysis results",
"description" : "FT ICR-MS-based metabolite assignment results table",
"filter" : "{\"url\": {\"$regex\": \"nom\\\\/results\"}, \"description\": {\"$regex\": \"FT ICR-MS\"}}",
"id" : "nmdc:sys045mx19"
}
{
"name" : "GC-MS Metabolomics Results",
"description" : "GC-MS-based metabolite assignment results table",
"filter" : "{\"url\": {\"$regex\": \"metabolomics\\\\/results\"}}",
@dwinston
dwinston / sensor.py
Created Jun 23, 2021
dagster resource in a sensor via preset definition run config
View sensor.py
from dagster import (
ModeDefinition, PresetDefinition, resource, StringSource,
build_init_resource_context, RunRequest, sensor,
)
class ApiClient:
def __init__(self, base_url: str, site_id: str, client_id: str, client_secret: str):
self.base_url = base_url
self.site_id = site_id
self.client_id = client_id
@dwinston
dwinston / all_your_zulip_are_belong_to_us.py
Last active Feb 26, 2021
get all Zulip messages sent by non-bot users to public streams
View all_your_zulip_are_belong_to_us.py
"""
A script developed to get all Zulip messages sent by non-bot users to public streams.
Need to pip install pymongo tqdm zulip, and run a local MongoDB server.
But you can also adapt the script to append to an in-memory Python list, and not need MongoDB or pymongo.
I found that the total volume of data in my case (see in-script comments) was 700MB uncompressed.
Developed at the Recurse Center (https://www.recurse.com/) in order to apply PageRank to Zulip entities.
Licensed as <https://opensource.org/licenses/MIT>(year=2021, copyright_holder="Donny Winston").
"""