Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@justin2004
Last active January 2, 2023 15:55
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save justin2004/f9d07adf4e7c2c422be3e0ba92f278d2 to your computer and use it in GitHub Desktop.
Save justin2004/f9d07adf4e7c2c422be3e0ba92f278d2 to your computer and use it in GitHub Desktop.
RDFS Reasoner Challenge (Tbox with 3M triples)

RDFS Reasoner Challenge (~3M Tbox triples, 1 Abox triple)

Goal

The goal is simple: infer class membership (using rdfs:subClassOf and rdf:type predicates). Don't do it with a property path or something. You must let the reasoner do it.

Attempts

I've tried to do this with a few reasoners. All unsuccessful.

  • Apache Jena wasn't able to do it with 12GB of RAM.
  • Stardog wasn't able to do it with 12GB of RAM.
  • REQUIEM wasn't able to do it with 12GB of RAM.

Data

In this zip file you'll find tbox.ttl and abox.ttl.

Query

This is the query that should return 79 results:

PREFIX  wd:   <http://www.wikidata.org/entity/>
PREFIX  ex:   <http://example.com/>
SELECT  *
WHERE
  { ex:condition0 a ?type
  }

Without reasoning it yields 1 result:

type
http://www\.wikidata\.org/entity/Q32552

But with RDFS reasoning enabled there should be 79 results.

e.g.

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://example.com/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  wd:   <http://www.wikidata.org/entity/>
SELECT  *
WHERE
  { ex:condition0 rdf:type/(rdfs:subClassOf)* ?type }

Yields:

type
http://www\.wikidata\.org/entity/Q32552
http://www\.wikidata\.org/entity/Q12397808
http://www\.wikidata\.org/entity/Q32540
http://www\.wikidata\.org/entity/Q12192
http://www\.wikidata\.org/entity/Q3392853
http://www\.wikidata\.org/entity/Q18553224
http://www\.wikidata\.org/entity/Q3286546
http://www\.wikidata\.org/entity/Q12136
http://www\.wikidata\.org/entity/Q2057971
http://www\.wikidata\.org/entity/Q7189713
http://www\.wikidata\.org/entity/Q3505845
http://www\.wikidata\.org/entity/Q937228
http://www\.wikidata\.org/entity/Q35120
http://www\.wikidata\.org/entity/Q483247
http://www\.wikidata\.org/entity/Q1190554
http://www\.wikidata\.org/entity/Q26907166
http://www\.wikidata\.org/entity/Q58415929
http://www\.wikidata\.org/entity/Q16722960
http://www\.wikidata\.org/entity/Q1149305
http://www\.wikidata\.org/entity/Q55919789
http://www\.wikidata\.org/entity/Q1207505
http://www\.wikidata\.org/entity/Q18603648
http://www\.wikidata\.org/entity/Q813912
http://www\.wikidata\.org/entity/Q21170479
http://www\.wikidata\.org/entity/Q18557436
http://www\.wikidata\.org/entity/Q3631290
http://www\.wikidata\.org/entity/Q754447
http://www\.wikidata\.org/entity/Q18123741
http://www\.wikidata\.org/entity/Q42417296
http://www\.wikidata\.org/entity/Q25383952
http://www\.wikidata\.org/entity/Q1284347
http://www\.wikidata\.org/entity/Q42183538
http://www\.wikidata\.org/entity/Q1441305
http://www\.wikidata\.org/entity/Q28807560
http://www\.wikidata\.org/entity/Q639907
http://www\.wikidata\.org/entity/Q193181
http://www\.wikidata\.org/entity/Q160402
http://www\.wikidata\.org/entity/Q781413
http://www\.wikidata\.org/entity/Q2996394
http://www\.wikidata\.org/entity/Q3249551
http://www\.wikidata\.org/entity/Q1150070
http://www\.wikidata\.org/entity/Q20937557
http://www\.wikidata\.org/entity/Q16887380
http://www\.wikidata\.org/entity/Q28813620
http://www\.wikidata\.org/entity/Q99527517
http://www\.wikidata\.org/entity/Q64732777
http://www\.wikidata\.org/entity/Q13878858
http://www\.wikidata\.org/entity/Q14912053
http://www\.wikidata\.org/entity/Q22299483
http://www\.wikidata\.org/entity/Q22299433
http://www\.wikidata\.org/entity/Q855395
http://www\.wikidata\.org/entity/Q29182544
http://www\.wikidata\.org/entity/Q4026292
http://www\.wikidata\.org/entity/Q169872
http://www\.wikidata\.org/entity/Q101991
http://www\.wikidata\.org/entity/Q31836626
http://www\.wikidata\.org/entity/Q505142
http://www\.wikidata\.org/entity/Q18558143
http://www\.wikidata\.org/entity/Q1963588
http://www\.wikidata\.org/entity/Q42982
http://www\.wikidata\.org/entity/Q14905917
http://www\.wikidata\.org/entity/Q14907126
http://www\.wikidata\.org/entity/Q22271820
http://www\.wikidata\.org/entity/Q14907559
http://www\.wikidata\.org/entity/Q22270763
http://www\.wikidata\.org/entity/Q14916317
http://www\.wikidata\.org/entity/Q14904580
http://www\.wikidata\.org/entity/Q14887714
http://www\.wikidata\.org/entity/Q1612119
http://www\.wikidata\.org/entity/Q14859560
http://www\.wikidata\.org/entity/Q5958765
http://www\.wikidata\.org/entity/Q3843811
http://www\.wikidata\.org/entity/Q929833
http://www\.wikidata\.org/entity/Q14905642
http://www\.wikidata\.org/entity/Q14874298
http://www\.wikidata\.org/entity/Q14819849
http://www\.wikidata\.org/entity/Q14820936
http://www\.wikidata\.org/entity/Q14819911
http://www\.wikidata\.org/entity/Q14907247
@pchampin
Copy link

pchampin commented Apr 8, 2022

@pchampin, if inferrust can do all of that forward-chaining inference in under a minute, it's really impressive.

All the credit goes to @jsubercaze, who implemented the original algorithm in Java.

@hmottestad
Copy link

hmottestad commented Apr 23, 2022

I've used this challenge to improve the reasoner in RDF4J. The improvements will be available in RDF4J 4.1.0.

On my laptop I'm able to run the challenge in just over 2 minutes with -Xmx12G.

RDF4J was previously only able to run the challenge if given significantly more memory (6 min with 24GB).

Here is the PR with the improvements: eclipse-rdf4j/rdf4j#3790

Here is the code i used to run the challenge: https://github.com/eclipse/rdf4j/blob/a023fb5d34d711ab25c113eee8e9902b7ae5e07f/core/sail/inferencer/src/test/java/org/eclipse/rdf4j/sail/inferencer/fc/RDFSChallenge.java

@jeswr
Copy link

jeswr commented May 7, 2022

I'm able to run reasoning on this challenge in less than a millisecond on this branch of N3.js - pinging @josd @pbonte @rubensworks

Running Reasoner Challenge

Load Reasoner Challenge: 21.605s
Reasoning: Reasoner Challenge: 0.893ms
79

The code I used to execute this challenge is

async function reasonerChallenge() {
  const store = new N3.Store();
  const TITLE = `Reasoner Challenge`;

  console.time(`Load ${TITLE}`);
  await load(`data/challenge/abox.ttl`, store);
  await load(`data/challenge/tbox.ttl`, store);
  console.timeEnd(`Load ${TITLE}`);

  console.time(`Reasoning: ${TITLE}`);
  store.reason([{
    premise: [new Quad(
      new Variable('?s'),
      new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
      new Variable('?o'),
    ), new Quad(
      new Variable('?o'),
      new NamedNode('http://www.w3.org/2000/01/rdf-schema#subClassOf'),
      new Variable('?o2'),
    )],
    conclusion: [
      new Quad(
        new Variable('?s'),
        new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
        new Variable('?o2'),
      ),
    ]
  }])
  console.timeEnd(`Reasoning: ${TITLE}`)

  console.log(store.getQuads(
    new NamedNode('http://example.com/condition0'),
    new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
    null,
  ).length)


  console.log()
}

Experiments were run on a DELL XPS15 laptop with 32GB of ram.

@pbonte
Copy link

pbonte commented May 7, 2022

Wow! Really impressive results @jeswr! I think even RDFox took more than 2ms to compute the materialization!

@jeswr
Copy link

jeswr commented May 7, 2022

Really impressive results @jeswr!

Thanks!

I think even RDFox took more than 2ms to compute the materialization!

It is worth observing that they appear to have applied all RDFS rules, when this challenge only requires the one rule to be applied as I have.

@justin2004
Copy link
Author

When I was unsuccessful with Apache Jena I used its full RDFS reasoner.
I bet Jena (cc @afs ) could do it if I used a single SWRL-ish rule like @jeswr did with N3.js.

Openlink's Virtuoso's full RDFS reasoner was not able to do this challenge but with a single custom rule (like @jeswr 's rule above) it was.

I'm quite happy this challenge is causing some semantic web development but I did call it an "RDFS Reasoner Challenge" so I assumed people would use full RDFS semantics and not just use custom rules crafted specifically for this challenge.

@hmottestad
Copy link

This challenge has been a fun way of improving performance in RDF4J. Initially we couldn’t load and forward chain the data, but now we have added support for a lower transaction isolation level, reduced forward chaining overhead and multi threaded forward chaining.

So we now have a general triple (quad) store with support for transaction isolation, query processing and now also a faster reasoner. (We also have a super fast SHACL engine :)

@mdesalvo
Copy link

mdesalvo commented Dec 22, 2022

Hi all,
I used this interesting challenge to spot bottlenecks in RDFSharp library (for those of you living in the .NET realm).
The first times it was not able to finish the task in linear time. Then I found chances of improvement which I'll make available in 3.1 milestone.
At the end RDFSharp reached the goal of answering 80 classes (we also include owl:NamedIndividual by default) in a reasoning time of 53 sec which took at its peak 5.7GB of memory. The platform for the experiment was a 32GB ThinkPad P53 running Windows 10 20H2.

dotnet run -c Release --project RDFSChallenge

Loading graph...DONE! [3102979 triples in 00:00:32.2454525; 2625MB process memory]
Loading ontology...DONE! [2525103 classes, 855 properties, 1 individuals in 00:01:00.0182808; 5778MB process memory]
Loading data lens...DONE! [80 classes in 00:00:53.9267432; 4030MB process memory]

Thanks and long life to the Semantic Web,
Marco

@justin2004
Copy link
Author

Very cool, @mdesalvo !
Did that reasoning include full RDFS entailment or did you configure RDFSharp to only do rdfs:subClassOf entailment?

@mdesalvo
Copy link

mdesalvo commented Dec 22, 2022

Hi @justin2004,
I followed the goal proposal: we have an API for finding the classes of a given individual in an ontology. Internally it subsumes rdfs:subClassOf taxonomy starting from known rdf:type classes of the individual. The only configuration I did was to disable awareness of owl:equivalentClass (which we support by default) because not stated by the goal.

Strictly speaking in terms of reasoner we don't have a monolithic engine which reads n3 rules and produces answers: we are a library, so we rather expose high-level usable APIs to applications for dealing with ontologies. I built a simple consolle application like this:

using RDFSharp.Model;
using RDFSharp.Semantics;
using System.Diagnostics;

Console.Write("Loading graph...");
Stopwatch sw = Stopwatch.StartNew();
RDFGraph abox = await RDFGraph.FromFileAsync(RDFModelEnums.RDFFormats.Turtle, Environment.CurrentDirectory + "\\abox.ttl");
RDFGraph tbox = await RDFGraph.FromFileAsync(RDFModelEnums.RDFFormats.Turtle, Environment.CurrentDirectory + "\\tbox.ttl");
foreach (RDFTriple aboxTriple in abox)
    tbox.AddTriple(aboxTriple);
sw.Stop();
Console.WriteLine($"DONE! {tbox.TriplesCount} triples in {sw.Elapsed}");

Console.Write("Loading ontology...");
sw.Restart();
OWLOntology ontology = await OWLOntology.FromRDFGraphAsync(tbox, 
    new OWLOntologyLoaderOptions() { EnableTaxonomyProtection=false, EnableAutomaticEntityDeclaration=true });
sw.Stop();
Console.WriteLine($"DONE! {ontology.Model.ClassModel.ClassesCount} classes, " +
    $"{ontology.Model.PropertyModel.PropertiesCount} properties, " +
    $"{ontology.Data.IndividualsCount} individuals in {sw.Elapsed}");

Console.Write("Loading data lens...");
sw.Restart();
OWLOntologyDataLens datalens = new OWLOntologyDataLens(new RDFResource("http://example.com/condition0"), ontology);
List<RDFResource> classTypes = await datalens.ClassTypesAsync(false);
sw.Stop();
Console.WriteLine($"DONE! {classTypes.Count} classes in {sw.Elapsed}");

Regards,
Marco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment