Skip to content

Instantly share code, notes, and snippets.

@kinow
Last active April 11, 2020 10:37
Show Gist options
  • Save kinow/dc73f4fcfbcaa0443945364a43924ae2 to your computer and use it in GitHub Desktop.
Save kinow/dc73f4fcfbcaa0443945364a43924ae2 to your computer and use it in GitHub Desktop.
StreamRDFWriter, and stream processing

Apache Jena project is like a box full of interesting things—at least if you love programming. One of its many features, is stream processing.

It may contain very large datasets, with gigabytes of data about graphs. Some queries may be quite large, so sending the whole result would be simply impracticable.

Instead, the data will go through ARQ. ARQ is a query engine for Jena that supports Sparql. There is one piece of code there that I found interesting while reviewing a small pull request: org.apache.jena.riot.system.StreamRDFWriter.

It is responsible for writing graph data in a streaming fashion. (See stream processing for programming models and more.)

Stream factories

StreamRDFWriter holds several implementations (as static members) of StreamRDFWriterFactory. The factory has one responsibility only, to create streams (StreamRDF), for a certain format and context.

image

Streams writer registry

All these factories and streams, the writer also needs a registry. It is used to access the writers required for streams using certain languages.

So if you have your graph dataset, and need to retrieve triples as thrift, you will interrogate the registry asking for a factory of that language (Turtle, N-Triples, RDF-Thrift, etc) or format (Flat Turtle, N-Quads, N-Triples-ASCII, RDF-Thrift, etc).

image

Writing data to streams

Each writer has one responsibility too—I really like the design of certain modules in Jena.

image

The action, however, happens somewhere else. In the StreamRDFOps and in the Iterator implementations is where the stream processing really takes place. But this goes beyond the StreamRDFWriter. So that's all for today.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment