Skip to content

Instantly share code, notes, and snippets.

@turbomam
Last active April 25, 2024 18:47
Show Gist options
  • Save turbomam/8d9c2e731edb36f3f4a9ba30b1bbf44e to your computer and use it in GitHub Desktop.
Save turbomam/8d9c2e731edb36f3f4a9ba30b1bbf44e to your computer and use it in GitHub Desktop.

from project.Makefile in nmdc-schema repo: make squeaky-clean all test

one output: project/owl/nmdc.owl.ttl

@prefix nmdc: <https://w3id.org/nmdc/> .

nmdc:nmdc a owl:Ontology ;
    rdfs:label "NMDC" ;
    dcterms:license "https://creativecommons.org/publicdomain/zero/1.0/" ;
    dcterms:title "NMDC Schema" ;
    pav:version "0.0.0" ;
    skos:definition """Schema for National Microbiome Data Collaborative (NMDC).
This schema is organized into multiple modules, such as:

 * a set of core types for representing data values
 * a subset of the mixs schema
 * an annotation schema
 * the NMDC schema itself, into which the other modules are imported""" ;
    skos:editorialNote "not importing any MIxS terms where the relationship between the name (SCN) and the id isn't 1:1" .

<https://w3id.org/mixs/env_broad_scale> a owl:ObjectProperty,
        linkml:SlotDefinition ;
    rdfs:label "env_broad_scale" ;
    dcterms:title "broad-scale environmental context" ;
    rdfs:range nmdc:ControlledIdentifiedTermValue ;
    rdfs:subPropertyOf <https://w3id.org/mixs/environment_field> ;
    skos:altLabel "broad-scale environmental context" ;
    skos:definition "Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO’s biome class:  http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS" ;
    skos:inScheme <https://raw.githubusercontent.com/microbiomedata/nmdc-schema/main/src/schema/mixs.yaml> ;
    nmdc:expected_value "The major environment type(s) where the sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes." .

make make-rdf

partial entity from local/mongo_as_nmdc_database_cuire_repaired_stamped.ttl

see also src/scripts/date_created_blank_node.py

gold:Gb0110681 a nmdc:Biosample ;
    dcterms:description "Grasslands soil microbial communities from the Angelo Coastal Reserve, plot 2. There is a duplicate submission for this entry in NCBI. The NCBI identifiers for a duplicate are PRJNA449266 and SAMN08902829" ;
    dcterms:isPartOf gold:Gs0110119 ;
    MIXS:0000012 [ a nmdc:ControlledIdentifiedTermValue ;
            nmdc:has_raw_value "grassland biome [ENVO:01000177]" ;
            nmdc:term ENVO:01000177 ] ;
    MIXS:0000013 [ a nmdc:ControlledIdentifiedTermValue ;
            nmdc:has_raw_value "biosphere reserve [ENVO:00000376]" ;
            nmdc:term ENVO:00000376 ] ;
    MIXS:0000014 [ a nmdc:ControlledIdentifiedTermValue ;
            nmdc:has_raw_value "grassland soil [ENVO:00005750]" ;
            nmdc:term ENVO:00005750 ] ;
	    
[ ] <https://schema.org/dateCreated> "2024-04-11T14:51:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
    <http://www.w3.org/2000/01/rdf-schema#comment> "https://api.microbiomedata.org" . 

project.Makefile

local/nmdc-no-use-native-uris.owl.ttl: src/schema/nmdc.yaml
	$(RUN) gen-owl --no-use-native-uris $< > $@


local/nmdc_materialized.ttl: src/schema/nmdc.yaml
	$(RUN) python src/scripts/schema_view_relation_graph.py \
		--schema $< \
		--output $@

local/nmdc-no-use-native-uris.owl.ttl

nmdc:nmdc.owl.ttl a owl:Ontology ;
    rdfs:label "NMDC" ;
    dcterms:license "https://creativecommons.org/publicdomain/zero/1.0/" ;
    dcterms:title "NMDC Schema" ;
    pav:version "0.0.0" ;
    skos:definition """Schema for National Microbiome Data Collaborative (NMDC).
This schema is organized into multiple modules, such as:

 * a set of core types for representing data values
 * a subset of the mixs schema
 * an annotation schema
 * the NMDC schema itself, into which the other modules are imported""" ;
    skos:editorialNote "not importing any MIxS terms where the relationship between the name (SCN) and the id isn't 1:1" .

<https://w3id.org/mixs/0000012> a owl:ObjectProperty ;
    rdfs:label "env_broad_scale" ;
    dcterms:title "broad-scale environmental context" ;
    rdfs:range nmdc:ControlledIdentifiedTermValue ;
    rdfs:subPropertyOf <https://w3id.org/mixs/environment_field> ;
    skos:altLabel "broad-scale environmental context" ;
    skos:definition "Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO’s biome class:  http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS" ;
    skos:inScheme <https://raw.githubusercontent.com/microbiomedata/nmdc-schema/main/src/schema/mixs.yaml> ;
    nmdc:expected_value "The major environment type(s) where the sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes." .

partial entity from local/nmdc_materialized.ttl

nmdc:Biosample a owl:Class ;
    dcterms:description xsd:string ;
    dcterms:isPartOf nmdc:Study ;
    MIXS:0000001 nmdc:QuantityValue ;
    MIXS:0000002 xsd:string ;
    MIXS:0000008 nmdc:ControlledTermValue ;
    MIXS:0000009 nmdc:GeolocationValue ;
    MIXS:0000010 nmdc:TextValue ;
    MIXS:0000011 nmdc:TimestampValue ;
    MIXS:0000012 nmdc:ControlledIdentifiedTermValue ;
    MIXS:0000013 nmdc:ControlledIdentifiedTermValue ;
    MIXS:0000014 nmdc:ControlledIdentifiedTermValue ;

http://3.236.215.220/repository

nmdc-knowledgegraph

http://3.236.215.220/graphs

This is not super efficient because some class/slot definitions are repeated


mixs.owl.ttl

@prefix MIXS: <https://w3id.org/mixs/> .

<https://w3id.org/mixs> a owl:Ontology ;
    rdfs:label "mixs" ;
    dcterms:source <https://github.com/GenomicsStandardsConsortium/mixs/raw/issue-610-temp-mixs-xlsx-home/mixs/excel/mixs_v6.xlsx> ;
    pav:version "v6.2.0" ;
    skos:note "slot titles that are associated with more than one slot name/SCN: host sex" .

MIXS:env_broad_scale a owl:ObjectProperty,
        linkml:SlotDefinition ;
    rdfs:label "env_broad_scale" ;
    dcterms:title "broad-scale environmental context" ;
    schema1:keywords "context",
        "environmental" ;
    rdfs:range [ a rdfs:Datatype ;
            owl:intersectionOf ( MIXS:string [ a rdfs:Datatype ;
                        owl:onDatatype xsd:string ;
                        owl:withRestrictions ( [ xsd:pattern "^([^\\s-]{1,2}|[^\\s-]+.+[^\\s-]+) \\[[a-zA-Z]{2,}:[a-zA-Z0-9]\\d+\\]$" ] ) ] ) ] ;
    skos:definition "Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO s biome class:  http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS" ;
    skos:inScheme <https://w3id.org/mixs> .
@turbomam
Copy link
Author

@turbomam
Copy link
Author

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX linkml: <https://w3id.org/linkml/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX nmdc: <https://w3id.org/nmdc/>
select
distinct *
where {
    VALUES ?propType {
        owl:ObjectProperty
        owl:DatatypeProperty
    }
    graph nmdc:nmdc {
        ?slot1 a ?propType ;
               rdfs:label ?label1
    }
    graph nmdc:nmdc-no-use-native-uris {
        ?slot2 a ?propType ;
               rdfs:label ?label2
    }
    filter(?label1 = ?label2)
    filter(?slot1 != ?slot2)
}
order by ?label1

@turbomam
Copy link
Author

could that automatically be reasoned over?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment