Skip to content

Instantly share code, notes, and snippets.

@justin2004
Last active December 28, 2022 20:58
Show Gist options
  • Save justin2004/de14b550c2cd86f5b0531a8689d133a1 to your computer and use it in GitHub Desktop.
Save justin2004/de14b550c2cd86f5b0531a8689d133a1 to your computer and use it in GitHub Desktop.
working on converting EWG to triples using SPARQL-Anything
location population_served years water_source name actual_mag actual_units guideline_mag guideline_units legal_limit_mag legal_limit_units
Ashtabula County, Ohio 39,838 2012—2017 Surface water Barium 19.0 ppb 700 ppb 2000 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Dibromoacetic acid 0.952 ppb 0.04 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Dibromochloromethane 3.85 ppb 0.1 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Monochloroacetic acid 2.19 ppb 53 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Total trihalomethanes (TTHMs)† 51.9 ppb 0.15 ppb 80 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Chlorate 184.0 ppb 210 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Vanadium 0.192 ppb 21 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Chromium (total) 2.37 ppb 100 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Chromium (hexavalent) 0.126 ppb 0.02 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Haloacetic acids (HAA5)† 36.9 ppb 0.1 ppb 60 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Dichloroacetic acid 17.6 ppb 0.2 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Bromodichloromethane 12.3 ppb 0.06 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Molybdenum 1.10 ppb 40 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Trichloroacetic acid 15.9 ppb 0.1 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Nitrate 0.0358 ppm 0.14 ppm 10 ppm
Ashtabula County, Ohio 39,838 2012—2017 Surface water Fluoride 0.940 ppm 4 ppm
Ashtabula County, Ohio 39,838 2012—2017 Surface water Monobromoacetic acid 0.298 ppb 25 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Strontium 162.2 ppb 1500 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Chloroform 35.6 ppb 0.4 ppb
Ashtabula County, Ohio 39,838 2012—2017 Surface water Bromoform 0.0784 ppb 0.5 ppb
product or material produced P1056
has facility P912
provide P176
water supply Q1061108
QX rdfs:label "ashtabula public water supply" ;
wdt:P366 (used for) Q7892 (drinking water) .
# how to relate to the city of ashtabula?
curl --silent 'http://localhost:3000/sparql.anything' \
--header "Accept: text/csv" \
--data-urlencode 'query=
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
prefix xhtml: <http://www.w3.org/1999/xhtml#>
prefix what: <https://html.spec.whatwg.org/#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT *
WHERE {
SERVICE <x-sparql-anything:> {
fx:properties fx:location "https://www.ewg.org/tapwater/system.php?pws=OH0400711" .
fx:properties fx:media-type "text/html" .
[] rdf:_1 [ what:innerText "Utility Details" ] ;
rdf:_2 [ rdf:_1 [ what:innerText ?location ] ;
rdf:_2 [ rdf:_1 [ what:innerText "Serves:" ] ;
rdf:_2 ?population_served
] ;
rdf:_3 [ rdf:_1 [ what:innerText "Data available:" ] ;
rdf:_2 ?years
] ;
rdf:_4 [ rdf:_1 [ what:innerText "Source:" ] ;
rdf:_2 ?water_source
] ;
] .
?div xhtml:class "contaminant-name" .
?div a xhtml:div .
?div ?p [ a xhtml:h3 ;
what:innerText ?name ] .
?div ?p1 ?div_div .
?div_div a xhtml:div ;
?p2 [ a xhtml:div ;
rdf:_1 [ what:innerText "THIS UTILITY" ] ;
rdf:_2 [ what:innerText ?actual_mag_with_units ]
]
optional {
?div_div ?p3 [ a xhtml:div ;
rdf:_1 [ what:innerText "EWG HEALTH GUIDELINE" ] ;
rdf:_2 [ what:innerText ?guideline_mag_with_units ]
]
}
optional {
?div_div ?p4 [ a xhtml:div ;
rdf:_1 [ what:innerText "LEGAL LIMIT" ] ;
rdf:_2 [ what:innerText ?legal_limit_mag_with_units ]
]
}
bind(xsd:float(replace(replace(?actual_mag_with_units," .*",""),",","")) as ?actual_mag) .
bind(replace(?actual_mag_with_units,".* ","") as ?actual_units) .
bind(xsd:float(replace(replace(?guideline_mag_with_units," .*",""),",","")) as ?guideline_mag) .
bind(replace(?guideline_mag_with_units,".* ","") as ?guideline_units) .
bind(xsd:float(replace(replace(?legal_limit_mag_with_units," .*",""),",","")) as ?legal_limit_mag) .
bind(replace(?legal_limit_mag_with_units,".* ","") as ?legal_limit_units) .
}
}'
@prefix schema: <http://schema.org/> .
@prefix pq: <http://www.wikidata.org/prop/qualifier/> .
@prefix bd: <http://www.bigdata.com/rdf#> .
@prefix pr: <http://www.wikidata.org/prop/reference/> .
@prefix ps: <http://www.wikidata.org/prop/statement/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix geof: <http://www.opengis.net/def/geosparql/function/> .
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix mwapi: <https://www.mediawiki.org/ontology#API/> .
@prefix wds: <http://www.wikidata.org/entity/statement/> . # Statement node, describes claim about entity.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix fn: <http://www.w3.org/2005/xpath-functions#> .
@prefix wdv: <http://www.wikidata.org/value/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix psn: <http://www.wikidata.org/prop/statement/value-normalized/> .
@prefix pqn: <http://www.wikidata.org/prop/qualifier/value-normalized/> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix psv: <http://www.wikidata.org/prop/statement/value/> .
@prefix pqv: <http://www.wikidata.org/prop/qualifier/value/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix gas: <http://www.bigdata.com/rdf/gas#> .
@prefix sch: <https://schema.org/> .
@prefix wdata: <http://www.wikidata.org/wiki/Special:EntityData/> .
@prefix wdref: <http://www.wikidata.org/reference/> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix prn: <http://www.wikidata.org/prop/reference/value-normalized/> .
@prefix p: <http://www.wikidata.org/prop/> .
@prefix bds: <http://www.bigdata.com/rdf/search#> .
@prefix wdtn: <http://www.wikidata.org/prop/direct-normalized/> .
@prefix mediawiki: <https://www.mediawiki.org/ontology#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix prv: <http://www.wikidata.org/prop/reference/value/> .
@prefix hint: <http://www.bigdata.com/queryHints#> .
@prefix wdno: <http://www.wikidata.org/prop/novalue/> .
@prefix sesame: <http://www.openrdf.org/schema/sesame#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
wd:Q108226511 rdfs:label "Ashtabula Public Water"@en ;
p:P527 wds:Q108226511-cc1c6370-451a-d1e2-b459-21486323234a .
wds:Q108226511-cc1c6370-451a-d1e2-b459-21486323234a
prov:wasDerivedFrom wdref:38f14c7257be0d0609acbca0b54070508f5394c4 ;
pq:P585 "2017-01-01T00:00:00Z"^^xsd:dateTime ;
pq:P6274 0.0358 ;
pqv:P585 wdv:c46565f21d8de8ea8b597dfaa738109a ;
pqv:P6274 wdv:44add8c3c23c527d0ca14d3a21e0474b ;
ps:P527 wd:Q172275 .
wdv:44add8c3c23c527d0ca14d3a21e0474b wikibase:quantityUnit wd:Q21006887 .
wdref:38f14c7257be0d0609acbca0b54070508f5394c4 pr:P854 <https://www.ewg.org/tapwater/system.php?pws=OH0400711> .
ps:P527 rdfs:label "has part"@en .
p:P527 rdfs:label "has part"@en .
wd:Q21006887 rdfs:label "parts per million"@en .
wd:Q172275 rdfs:label "chloroform"@en .
pqv:P6274 rdfs:label "concentration"@en .
pq:P6274 rdfs:label "concentration"@en .
pqv:P585 rdfs:label "point in time"@en .
pq:P585 rdfs:label "point in time"@en .
# prefix reference https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Prefixes_used
@justin2004
Copy link
Author

@justin2004
Copy link
Author

justin2004 commented Aug 24, 2021

the usefulness of a triple (in a triplestore or in a person's head) is a function of:

  • how quickly the triple can be used during query answering
    • this means the quickness of loading new data is a factor
  • the number of other available useful triples that share a compatible vocabulary
  • the number of other available useful triples that are connected (directly or indirectly)
  • the ease/pleasure with which a competent query writer can craft queries using the triple's vocabulary
  • the extent to which the triple's vocabulary permits relevant semantic generalizations

assumptions:

  • sharing a common vocabulary means the domain is of interest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment