Skip to content

Instantly share code, notes, and snippets.

@mikedias
Created March 11, 2016 13:53
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikedias/e169b67a62ccebff3be4 to your computer and use it in GitHub Desktop.
Save mikedias/e169b67a62ccebff3be4 to your computer and use it in GitHub Desktop.
ScritpInputFormat for wikipedia RDF format
/*
Parses the long_abstracts format:
<http://dbpedia.org/resource/Anarchism> <http://dbpedia.org/ontology/abstract> "Anarchism is a collection of movements and ideologies that ..."@en .
*/
import org.openrdf.rio.*
import org.openrdf.rio.helpers.*
def parse(line, factory) {
def reader = new StringReader(line);
def model = new org.openrdf.model.impl.LinkedHashModel();
def rdfParser = Rio.createParser(RDFFormat.TURTLE);
rdfParser.setRDFHandler(new StatementCollector(model));
rdfParser.parse(reader, "http://dbpedia.org/");
def v1;
model.forEach{
def name = it.getSubject().stringValue().split("/").last();
def text = it.getObject().stringValue();
v1 = factory.vertex(name, "article");
v1.property("article.name", name);
v1.property("article.text", text);
}
return v1;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment