Skip to content

Instantly share code, notes, and snippets.

@justin2004
Last active Aug 13, 2022
Embed
What would you like to do?
RDFS Reasoner Challenge (Tbox with 3M triples)

RDFS Reasoner Challenge (~3M Tbox triples, 1 Abox triple)

Goal

The goal is simple: infer class membership (using rdfs:subClassOf and rdf:type predicates). Don't do it with a property path or something. You must let the reasoner do it.

Attempts

I've tried to do this with a few reasoners. All unsuccessful.

  • Apache Jena wasn't able to do it with 12GB of RAM.
  • Stardog wasn't able to do it with 12GB of RAM.
  • REQUIEM wasn't able to do it with 12GB of RAM.

Data

In this zip file you'll find tbox.ttl and abox.ttl.

Query

This is the query that should return 79 results:

PREFIX  wd:   <http://www.wikidata.org/entity/>
PREFIX  ex:   <http://example.com/>
SELECT  *
WHERE
  { ex:condition0 a ?type
  }

Without reasoning it yields 1 result:

type
http://www\.wikidata\.org/entity/Q32552

But with RDFS reasoning enabled there should be 79 results.

e.g.

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://example.com/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  wd:   <http://www.wikidata.org/entity/>
SELECT  *
WHERE
  { ex:condition0 rdf:type/(rdfs:subClassOf)* ?type }

Yields:

type
http://www\.wikidata\.org/entity/Q32552
http://www\.wikidata\.org/entity/Q12397808
http://www\.wikidata\.org/entity/Q32540
http://www\.wikidata\.org/entity/Q12192
http://www\.wikidata\.org/entity/Q3392853
http://www\.wikidata\.org/entity/Q18553224
http://www\.wikidata\.org/entity/Q3286546
http://www\.wikidata\.org/entity/Q12136
http://www\.wikidata\.org/entity/Q2057971
http://www\.wikidata\.org/entity/Q7189713
http://www\.wikidata\.org/entity/Q3505845
http://www\.wikidata\.org/entity/Q937228
http://www\.wikidata\.org/entity/Q35120
http://www\.wikidata\.org/entity/Q483247
http://www\.wikidata\.org/entity/Q1190554
http://www\.wikidata\.org/entity/Q26907166
http://www\.wikidata\.org/entity/Q58415929
http://www\.wikidata\.org/entity/Q16722960
http://www\.wikidata\.org/entity/Q1149305
http://www\.wikidata\.org/entity/Q55919789
http://www\.wikidata\.org/entity/Q1207505
http://www\.wikidata\.org/entity/Q18603648
http://www\.wikidata\.org/entity/Q813912
http://www\.wikidata\.org/entity/Q21170479
http://www\.wikidata\.org/entity/Q18557436
http://www\.wikidata\.org/entity/Q3631290
http://www\.wikidata\.org/entity/Q754447
http://www\.wikidata\.org/entity/Q18123741
http://www\.wikidata\.org/entity/Q42417296
http://www\.wikidata\.org/entity/Q25383952
http://www\.wikidata\.org/entity/Q1284347
http://www\.wikidata\.org/entity/Q42183538
http://www\.wikidata\.org/entity/Q1441305
http://www\.wikidata\.org/entity/Q28807560
http://www\.wikidata\.org/entity/Q639907
http://www\.wikidata\.org/entity/Q193181
http://www\.wikidata\.org/entity/Q160402
http://www\.wikidata\.org/entity/Q781413
http://www\.wikidata\.org/entity/Q2996394
http://www\.wikidata\.org/entity/Q3249551
http://www\.wikidata\.org/entity/Q1150070
http://www\.wikidata\.org/entity/Q20937557
http://www\.wikidata\.org/entity/Q16887380
http://www\.wikidata\.org/entity/Q28813620
http://www\.wikidata\.org/entity/Q99527517
http://www\.wikidata\.org/entity/Q64732777
http://www\.wikidata\.org/entity/Q13878858
http://www\.wikidata\.org/entity/Q14912053
http://www\.wikidata\.org/entity/Q22299483
http://www\.wikidata\.org/entity/Q22299433
http://www\.wikidata\.org/entity/Q855395
http://www\.wikidata\.org/entity/Q29182544
http://www\.wikidata\.org/entity/Q4026292
http://www\.wikidata\.org/entity/Q169872
http://www\.wikidata\.org/entity/Q101991
http://www\.wikidata\.org/entity/Q31836626
http://www\.wikidata\.org/entity/Q505142
http://www\.wikidata\.org/entity/Q18558143
http://www\.wikidata\.org/entity/Q1963588
http://www\.wikidata\.org/entity/Q42982
http://www\.wikidata\.org/entity/Q14905917
http://www\.wikidata\.org/entity/Q14907126
http://www\.wikidata\.org/entity/Q22271820
http://www\.wikidata\.org/entity/Q14907559
http://www\.wikidata\.org/entity/Q22270763
http://www\.wikidata\.org/entity/Q14916317
http://www\.wikidata\.org/entity/Q14904580
http://www\.wikidata\.org/entity/Q14887714
http://www\.wikidata\.org/entity/Q1612119
http://www\.wikidata\.org/entity/Q14859560
http://www\.wikidata\.org/entity/Q5958765
http://www\.wikidata\.org/entity/Q3843811
http://www\.wikidata\.org/entity/Q929833
http://www\.wikidata\.org/entity/Q14905642
http://www\.wikidata\.org/entity/Q14874298
http://www\.wikidata\.org/entity/Q14819849
http://www\.wikidata\.org/entity/Q14820936
http://www\.wikidata\.org/entity/Q14819911
http://www\.wikidata\.org/entity/Q14907247
@VladimirAlexiev
Copy link

VladimirAlexiev commented Apr 6, 2022

@justin2004
Ok, seems you've dumped the wdt:P279 property.
But it's widely crowd-sourced (any WD editor can change it) and is not a proper class hierarchy.

Have you analyzed the structure of that subClassOf graph?

  • What is the max height? Eg a chain of length 100 will generate 5k triples due to transitive closure
  • Are there loops? Loops will cause cliques (equivalence clusters where each node is connected to each node). Eg a loop of length 100 will generate 10k triples.

So these 3M explicit triples could result in hundreds of millions of inferred triples.

	Id: rdfs11
	  a <rdfs:subClassOf> b [Constraint a != b]
	  b <rdfs:subClassOf> c [Constraint a != c, b != c]
	------------------------------------
	  a <rdfs:subClassOf> c

@josd
Copy link

josd commented Apr 6, 2022

Here is the result with the EYE reasoner:

$ swipl --stack-limit=8G -f /opt/eye/src/eye.pl -g main -- --nope abox.ttl rules3.n3 --query query.n3
eye --nope abox.ttl rules3.n3 --query query.n3
EYE v22.0406.1028 josd
SWI-Prolog version 8.5.9-31-g440df9e85
starting 169 [msec cputime] 169 [msec walltime]
#Processed by EYE v22.0406.1028 josd
#eye --nope abox.ttl rules3.n3 --query query.n3

GET file:///home/jdroo/Downloads/challenge/abox.ttl SC=1
GET file:///home/jdroo/Downloads/challenge/rules3.n3 SC=3102420
GET file:///home/jdroo/Downloads/challenge/query.n3 SC=1
networking 56254 [msec cputime] 323767 [msec walltime]
@prefix wd: <http://www.wikidata.org/entity/>.
@prefix ex: <http://example.com/>.
@prefix schema: <http://schema.org/>.
@prefix pq: <http://www.wikidata.org/prop/qualifier/>.
@prefix bd: <http://www.bigdata.com/rdf#>.
@prefix pr: <http://www.wikidata.org/prop/reference/>.
@prefix ps: <http://www.wikidata.org/prop/statement/>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix geof: <http://www.opengis.net/def/geosparql/function/>.
@prefix wdt: <http://www.wikidata.org/prop/direct/>.
@prefix mwapi: <https://www.mediawiki.org/ontology#API/>.
@prefix wds: <http://www.wikidata.org/entity/statement/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix fn: <http://www.w3.org/2005/xpath-functions#>.
@prefix wdv: <http://www.wikidata.org/value/>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix psn: <http://www.wikidata.org/prop/statement/value-normalized/>.
@prefix pqn: <http://www.wikidata.org/prop/qualifier/value-normalized/>.
@prefix geo: <http://www.opengis.net/ont/geosparql#>.
@prefix psv: <http://www.wikidata.org/prop/statement/value/>.
@prefix pqv: <http://www.wikidata.org/prop/qualifier/value/>.
@prefix dct: <http://purl.org/dc/terms/>.
@prefix gas: <http://www.bigdata.com/rdf/gas#>.
@prefix sch: <https://schema.org/>.
@prefix wdata: <http://www.wikidata.org/wiki/Special:EntityData/>.
@prefix wdref: <http://www.wikidata.org/reference/>.
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#>.
@prefix prov: <http://www.w3.org/ns/prov#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix wikibase: <http://wikiba.se/ontology#>.
@prefix prn: <http://www.wikidata.org/prop/reference/value-normalized/>.
@prefix p: <http://www.wikidata.org/prop/>.
@prefix bds: <http://www.bigdata.com/rdf/search#>.
@prefix wdtn: <http://www.wikidata.org/prop/direct-normalized/>.
@prefix mediawiki: <https://www.mediawiki.org/ontology#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix prv: <http://www.wikidata.org/prop/reference/value/>.
@prefix hint: <http://www.bigdata.com/queryHints#>.
@prefix wdno: <http://www.wikidata.org/prop/novalue/>.
@prefix sesame: <http://www.openrdf.org/schema/sesame#>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix e: <http://eulersharp.sourceforge.net/2003/03swap/log-rules#>.

ex:condition0 a wd:Q32552.
ex:condition0 a wd:Q12397808.
ex:condition0 a wd:Q42982.
ex:condition0 a wd:Q32540.
ex:condition0 a wd:Q14905917.
ex:condition0 a wd:Q5958765.
ex:condition0 a wd:Q14907126.
ex:condition0 a wd:Q14905642.
ex:condition0 a wd:Q3843811.
ex:condition0 a wd:Q1441305.
ex:condition0 a wd:Q505142.
ex:condition0 a wd:Q12192.
ex:condition0 a wd:Q18557436.
ex:condition0 a wd:Q929833.
ex:condition0 a wd:Q12136.
ex:condition0 a wd:Q22271820.
ex:condition0 a wd:Q14904580.
ex:condition0 a wd:Q14907559.
ex:condition0 a wd:Q18558143.
ex:condition0 a wd:Q28807560.
ex:condition0 a wd:Q7189713.
ex:condition0 a wd:Q14907247.
ex:condition0 a wd:Q14874298.
ex:condition0 a wd:Q639907.
ex:condition0 a wd:Q2057971.
ex:condition0 a wd:Q14819849.
ex:condition0 a wd:Q1612119.
ex:condition0 a wd:Q1963588.
ex:condition0 a wd:Q14820936.
ex:condition0 a wd:Q3631290.
ex:condition0 a wd:Q3392853.
ex:condition0 a wd:Q1284347.
ex:condition0 a wd:Q14887714.
ex:condition0 a wd:Q169872.
ex:condition0 a wd:Q14916317.
ex:condition0 a wd:Q14859560.
ex:condition0 a wd:Q18553224.
ex:condition0 a wd:Q754447.
ex:condition0 a wd:Q193181.
ex:condition0 a wd:Q22270763.
ex:condition0 a wd:Q3286546.
ex:condition0 a wd:Q14819911.
ex:condition0 a wd:Q3505845.
ex:condition0 a wd:Q1149305.
ex:condition0 a wd:Q160402.
ex:condition0 a wd:Q42183538.
ex:condition0 a wd:Q101991.
ex:condition0 a wd:Q2996394.
ex:condition0 a wd:Q18123741.
ex:condition0 a wd:Q55919789.
ex:condition0 a wd:Q937228.
ex:condition0 a wd:Q483247.
ex:condition0 a wd:Q855395.
ex:condition0 a wd:Q781413.
ex:condition0 a wd:Q31836626.
ex:condition0 a wd:Q29182544.
ex:condition0 a wd:Q16722960.
ex:condition0 a wd:Q1190554.
ex:condition0 a wd:Q42417296.
ex:condition0 a wd:Q21170479.
ex:condition0 a wd:Q18603648.
ex:condition0 a wd:Q1207505.
ex:condition0 a wd:Q35120.
ex:condition0 a wd:Q25383952.
ex:condition0 a wd:Q4026292.
ex:condition0 a wd:Q26907166.
ex:condition0 a wd:Q64732777.
ex:condition0 a wd:Q13878858.
ex:condition0 a wd:Q3249551.
ex:condition0 a wd:Q20937557.
ex:condition0 a wd:Q1150070.
ex:condition0 a wd:Q58415929.
ex:condition0 a wd:Q16887380.
ex:condition0 a wd:Q14912053.
ex:condition0 a wd:Q813912.
ex:condition0 a wd:Q28813620.
ex:condition0 a wd:Q22299483.
ex:condition0 a wd:Q22299433.
ex:condition0 a wd:Q99527517.

reasoning 36626 [msec cputime] 36699 [msec walltime]
#2022-04-06T16:18:55.667Z in=3102422 out=79 ent=157 step=802 brake=6 inf=3236199852 sec=93.049 inf/sec=34779523
#ENDS

2022-04-06T16:18:55.667Z in=3102422 out=79 ent=157 step=802 brake=6 inf=3236199852 sec=93.049 inf/sec=34779523

rules3.n3 is generated via

swipl --stack-limit=8G -f /opt/eye/src/eye.pl -g main -- --nope tbox.ttl --query https://josd.github.io/eye/reasoning/dt/parteval-subclass.n3 > rules3.n3

and

query.n3 is

@prefix ex: <http://example.com/>.

{ex:condition0 a ?type} => {ex:condition0 a ?type}.

@VladimirAlexiev
Copy link

VladimirAlexiev commented Apr 6, 2022

BTW, tbox has about 2.5M (2525953) nodes and some are not WD items but:

@pchampin
Copy link

pchampin commented Apr 7, 2022

Note that a complete RDFS reasonner should give 80 results: the 79 listed above... and rdfs:Resource!

Inferrust, a student project that I never finished tidying up ☹️ , loads the graph in ~8s, performs forward chaining reasoning in ~11s, and outputs the correct 80 results. This is on my Dell XPS laptop (core i7, 16GB of RAM).

Gives me some motivation for finalizing the tidying up... Thanks for this!

@josd
Copy link

josd commented Apr 7, 2022

Waw @pchampin that is really impressive and it was a pleasure to reproduce your results on my HP Pavilion laptop (core i7, 12GB of RAM)

~/github.com/pchampin/inferrust_rs/target/release/examples/challenge tbox.ttl abox.ttl
#load time, graphsize, proc time,  inferred,
# 6.347270,   3102979, 11.500086, +71360452,
http://www.w3.org/2000/01/rdf-schema#Resource
http://www.wikidata.org/entity/Q32552
http://www.wikidata.org/entity/Q12397808
http://www.wikidata.org/entity/Q42982
http://www.wikidata.org/entity/Q32540
http://www.wikidata.org/entity/Q1190554
http://www.wikidata.org/entity/Q12136
http://www.wikidata.org/entity/Q937228
http://www.wikidata.org/entity/Q3249551
http://www.wikidata.org/entity/Q929833
http://www.wikidata.org/entity/Q1612119
http://www.wikidata.org/entity/Q14916317
http://www.wikidata.org/entity/Q14859560
http://www.wikidata.org/entity/Q18123741
http://www.wikidata.org/entity/Q31836626
http://www.wikidata.org/entity/Q1150070
http://www.wikidata.org/entity/Q2996394
http://www.wikidata.org/entity/Q42183538
http://www.wikidata.org/entity/Q3392853
http://www.wikidata.org/entity/Q35120
http://www.wikidata.org/entity/Q160402
http://www.wikidata.org/entity/Q813912
http://www.wikidata.org/entity/Q639907
http://www.wikidata.org/entity/Q1207505
http://www.wikidata.org/entity/Q12192
http://www.wikidata.org/entity/Q1441305
http://www.wikidata.org/entity/Q18603648
http://www.wikidata.org/entity/Q169872
http://www.wikidata.org/entity/Q14819911
http://www.wikidata.org/entity/Q20937557
http://www.wikidata.org/entity/Q781413
http://www.wikidata.org/entity/Q7189713
http://www.wikidata.org/entity/Q483247
http://www.wikidata.org/entity/Q1963588
http://www.wikidata.org/entity/Q16887380
http://www.wikidata.org/entity/Q22299433
http://www.wikidata.org/entity/Q3505845
http://www.wikidata.org/entity/Q3843811
http://www.wikidata.org/entity/Q1149305
http://www.wikidata.org/entity/Q101991
http://www.wikidata.org/entity/Q18558143
http://www.wikidata.org/entity/Q4026292
http://www.wikidata.org/entity/Q28813620
http://www.wikidata.org/entity/Q14905917
http://www.wikidata.org/entity/Q3286546
http://www.wikidata.org/entity/Q14874298
http://www.wikidata.org/entity/Q18553224
http://www.wikidata.org/entity/Q18557436
http://www.wikidata.org/entity/Q3631290
http://www.wikidata.org/entity/Q754447
http://www.wikidata.org/entity/Q64732777
http://www.wikidata.org/entity/Q42417296
http://www.wikidata.org/entity/Q22270763
http://www.wikidata.org/entity/Q14907247
http://www.wikidata.org/entity/Q505142
http://www.wikidata.org/entity/Q2057971
http://www.wikidata.org/entity/Q5958765
http://www.wikidata.org/entity/Q14819849
http://www.wikidata.org/entity/Q22299483
http://www.wikidata.org/entity/Q14907126
http://www.wikidata.org/entity/Q22271820
http://www.wikidata.org/entity/Q14904580
http://www.wikidata.org/entity/Q16722960
http://www.wikidata.org/entity/Q99527517
http://www.wikidata.org/entity/Q55919789
http://www.wikidata.org/entity/Q21170479
http://www.wikidata.org/entity/Q14912053
http://www.wikidata.org/entity/Q13878858
http://www.wikidata.org/entity/Q14907559
http://www.wikidata.org/entity/Q193181
http://www.wikidata.org/entity/Q28807560
http://www.wikidata.org/entity/Q25383952
http://www.wikidata.org/entity/Q14820936
http://www.wikidata.org/entity/Q1284347
http://www.wikidata.org/entity/Q26907166
http://www.wikidata.org/entity/Q29182544
http://www.wikidata.org/entity/Q14887714
http://www.wikidata.org/entity/Q58415929
http://www.wikidata.org/entity/Q855395
http://www.wikidata.org/entity/Q14905642

@josd
Copy link

josd commented Apr 7, 2022

@pchampin with the latest Rust it seems to run about 3 times slower?

~/github.com/pchampin/inferrust_rs/target/release/examples/challenge tbox.ttl abox.ttl
#load time, graphsize, proc time,  inferred,
# 6.714673,   3102979, 37.363344, +71360452,

@rdstn
Copy link

rdstn commented Apr 8, 2022

I have a slighly more powerful machine than @VladimirAlexiev, so I tried it out with GraphDB as well. Nothing special in regards to the setup, just a bit better hardware. I was also doing other memory-intensive operations at the time, so hardly a good benchmark, but perhaps somewhat illustrative of what could be going on in a production server.

Note that GraphDB is forward chaining, so it does materialization at import time.

The data loaded in 36 minutes. The total statements for that DB are 71,938,374, with a substantial expansion ratio of 23.18. Once that is done, the query executed in a few miliseconds.

The benefit is the query speed, so if you don't have just one node, but a few million of them, this would give a better performance than a backward-chaining DB.

@pchampin, if inferrust can do all of that forward-chaining inference in under a minute, it's really impressive.

@justin2004
Copy link
Author

justin2004 commented Apr 8, 2022

Thanks for sharing the results @rdstn !

Arachne can also do it in under 2 minutes.
There is some more discussion about this challenge here.

@pchampin
Copy link

pchampin commented Apr 8, 2022

@pchampin, if inferrust can do all of that forward-chaining inference in under a minute, it's really impressive.

All the credit goes to @jsubercaze, who implemented the original algorithm in Java.

@hmottestad
Copy link

hmottestad commented Apr 23, 2022

I've used this challenge to improve the reasoner in RDF4J. The improvements will be available in RDF4J 4.1.0.

On my laptop I'm able to run the challenge in just over 2 minutes with -Xmx12G.

RDF4J was previously only able to run the challenge if given significantly more memory (6 min with 24GB).

Here is the PR with the improvements: eclipse/rdf4j#3790

Here is the code i used to run the challenge: https://github.com/eclipse/rdf4j/blob/a023fb5d34d711ab25c113eee8e9902b7ae5e07f/core/sail/inferencer/src/test/java/org/eclipse/rdf4j/sail/inferencer/fc/RDFSChallenge.java

@jeswr
Copy link

jeswr commented May 7, 2022

I'm able to run reasoning on this challenge in less than a millisecond on this branch of N3.js - pinging @josd @pbonte @rubensworks

Running Reasoner Challenge

Load Reasoner Challenge: 21.605s
Reasoning: Reasoner Challenge: 0.893ms
79

The code I used to execute this challenge is

async function reasonerChallenge() {
  const store = new N3.Store();
  const TITLE = `Reasoner Challenge`;

  console.time(`Load ${TITLE}`);
  await load(`data/challenge/abox.ttl`, store);
  await load(`data/challenge/tbox.ttl`, store);
  console.timeEnd(`Load ${TITLE}`);

  console.time(`Reasoning: ${TITLE}`);
  store.reason([{
    premise: [new Quad(
      new Variable('?s'),
      new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
      new Variable('?o'),
    ), new Quad(
      new Variable('?o'),
      new NamedNode('http://www.w3.org/2000/01/rdf-schema#subClassOf'),
      new Variable('?o2'),
    )],
    conclusion: [
      new Quad(
        new Variable('?s'),
        new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
        new Variable('?o2'),
      ),
    ]
  }])
  console.timeEnd(`Reasoning: ${TITLE}`)

  console.log(store.getQuads(
    new NamedNode('http://example.com/condition0'),
    new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
    null,
  ).length)


  console.log()
}

Experiments were run on a DELL XPS15 laptop with 32GB of ram.

@pbonte
Copy link

pbonte commented May 7, 2022

Wow! Really impressive results @jeswr! I think even RDFox took more than 2ms to compute the materialization!

@jeswr
Copy link

jeswr commented May 7, 2022

Really impressive results @jeswr!

Thanks!

I think even RDFox took more than 2ms to compute the materialization!

It is worth observing that they appear to have applied all RDFS rules, when this challenge only requires the one rule to be applied as I have.

@justin2004
Copy link
Author

justin2004 commented May 7, 2022

When I was unsuccessful with Apache Jena I used its full RDFS reasoner.
I bet Jena (cc @afs ) could do it if I used a single SWRL-ish rule like @jeswr did with N3.js.

Openlink's Virtuoso's full RDFS reasoner was not able to do this challenge but with a single custom rule (like @jeswr 's rule above) it was.

I'm quite happy this challenge is causing some semantic web development but I did call it an "RDFS Reasoner Challenge" so I assumed people would use full RDFS semantics and not just use custom rules crafted specifically for this challenge.

@hmottestad
Copy link

hmottestad commented May 7, 2022

This challenge has been a fun way of improving performance in RDF4J. Initially we couldn’t load and forward chain the data, but now we have added support for a lower transaction isolation level, reduced forward chaining overhead and multi threaded forward chaining.

So we now have a general triple (quad) store with support for transaction isolation, query processing and now also a faster reasoner. (We also have a super fast SHACL engine :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment