Skip to content

Instantly share code, notes, and snippets.

@tomsaleeba
Created May 15, 2018
Embed
What would you like to do?
Deleting batches of records with SPARQL

I learned this when trying to clear our records in AWS Neptune. I was hitting the query timeout when trying to drop an entire graph. If you don't want to/can't raise the timeout, you can drop smaller parts of the graph in each transaction.

curl -sX POST http://<cluster-prefix>.rds.amazonaws.com:8182/sparql --data-urlencode 'update=
DELETE {
  GRAPH <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> { ?s ?p ?o }
}
WHERE {
  GRAPH <http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph> {
    {
      SELECT ?s ?p ?o
      WHERE {
        ?s ?p ?o .
      }
      LIMIT 10
    }
  }
}
'

This will delete 10 records, specifically the first 10 that are returned for a SELECT * WHERE { ?s ?p ?o } query. You can adjust the limit value to find a batch size that keeps you under the timeout.

Yeah, this is a dirty hack but there was a bit of pain to learn this so I want to store the knowledge.

Also, be sure to use --data-urlencode not --data-binary otherwise you might find the server ignores your input but doesn't give any indication of error.

@agcunha
Copy link

agcunha commented Oct 7, 2019

Thank you. You helped me a lot! My rdf dataset has 20000000 triples and adjusted the limit to 1000000

@GeniJaho
Copy link

GeniJaho commented Oct 17, 2020

Thanks man, made my day better.

@nzewail
Copy link

nzewail commented Jan 5, 2022

Super helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment