Skip to content

Instantly share code, notes, and snippets.

@dpitera
Last active May 1, 2020 23:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dpitera/80eb2e55fde143df1eb2cf7e97dedc9a to your computer and use it in GitHub Desktop.
Save dpitera/80eb2e55fde143df1eb2cf7e97dedc9a to your computer and use it in GitHub Desktop.
ConfiguredGraphFactory Documentation

ConfiguredGraphFactory

The ConfiguredGraphFactory is an access point to your graphs, similar to the JanusGraphFactory.

ConfiguredGraphFactory versus JanusGraphFactory

However, there is an important distinction between these two graph factories:

  1. The ConfiguredGraphFactory can only be used if you have configured your server to use the ConfigurationGraphManagement APIs at server start.

The benefits of using the ConfiguredGraphFactory is that:

  1. You only need to supply a String to access your graphs, as opposed to the JanusGraphFactory-- which requires you to specify information about the backend you wish to use when accessing a graph-- every time you open a graph.
  2. Assuming your ConfigurationGraphManagement has not been configured to operate on an inmemory graph database, then your graph configurations are stored across all JanusGraph nodes in your cluster.

How Does the ConfiguredGraphFactory Work?

The ConfiguredGraphFactory provides an access point to graphs under two scenarios:

  1. You have already created a configuration for your specific graph object using the ConfigurationGraphManagememt#createConfiguration. In this scenario, your graph is opened using the previously created configuration for this graph.
  2. You have already created a template configuration using the ConfigurationGraphManagement#createTemplateConfiguration. In this scenario, we create a configuration for the graph you are creating by copying over all attributes stored in your template configuration and appending the relevant graphName attribute, and we then open the graph according to that specific configuration.

Configuring Your Server To Use the ConfiguredGraphFactory

To be able to use the ConfiguredGraphFactory, you must configure your server to use the ConfigurationGraphManagement APIs. To do this, you have to inject a graph variable named "JanusConfigurationGraph" in your server's YAML's graphs map. For example:

host: localhost
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphManager: org.janusgraph.graphdb.management.JanusGraphManager
graphs: {
  graph: conf/gremlin-server/janusgraph-cassandra-es-server.properties,
  JanusConfigurationGraph: conf/janusgraph-cassandra-configurationgraph.properties
}
plugins:
  - janusgraph.imports
scriptEngines: {
  gremlin-groovy: {
    imports: [java.lang.Math],
    staticImports: [java.lang.Math.PI],
    scripts: [scripts/empty-sample.groovy]}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

In this example, our ConfigurationGraphManagement graph will be configured using the properties stored inside conf/janusgraph-cassandra-configurationgraph.properties, which for example, look like:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cassandrathrift
graph.graphname=JanusConfigurationGraph
storage.hostname=127.0.0.1
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

Assuming the GremlinServer started successfully and the JanusConfigurationGraph was successfully instantiated, then all the APIs available on the ConfigurationGraphManagement Singleton will also act upon said graph. Furthermore, this is the graph that will be used to access the configurations used to create/open graphs using the ConfiguredGraphFactory.

ConfigurationGraphManagement

The ConfigurationGraphManagement is a Singleton that allows you to create/update/remove configurations that you can use to access your graphs using the ConfiguredGraphFactory. See above on configuring your server to enable use of these APIs.

Graph Configurations

The ConfigurationGraphManagement singleton allows you to create configurations used to open specific graphs, referenced by the "graph.graphname" property. For example:

Map<String, Object> map = new HashMap<String, Object>();
map.put("storage.backend", "cassandrathrift");
map.put("storage.hostname", "127.0.0.1");
map.put("graph.graphname", "graph1");
ConfigurationGraphManagement.getInstance().createConfiguration(new MapConfiguration(map));

Then you could access this graph on any JanusGraph node using:

ConfiguredGraphFactory.open("graph1");

Template Configuration

The ConfigurationGraphManagement also allows you to create one template configuration, which you can use to create many graphs using the same configuration template. For example:

Map<String, Object> map = new HashMap<String, Object>();
map.put("storage.backend", "cassandrathrift");
map.put("storage.hostname", "127.0.0.1");
ConfigurationGraphManagement.getInstance().createTemplateConfiguration(new MapConfiguration(map));

After doing this, you can create graphs using the template configuration:

ConfiguredGraphFactory.create("graph2");

This method will first create a new configuration for "graph2" by copying over all the properties associated with the template configuration and storing it on a configuration for this specific graph. This means that this graph can be accessed in, on any JanusGraph node, in the future by doing:

ConfiguredGraphFactory.open("graph2");

Updating Configurations

IMPORTANT

All interactions with both the JanusGraphFactory and the ConfiguredGraphFactory that interact with configurations that define the property "graph.graphname" go through the JanusGraphManager which keeps track of graph references created on the given JVM. Think of it as a graph cache. For this reason:

ANY UPDATES TO A CONFIGURATION ARE NOT GUARANTEED TO TAKE EFFECT UNTIL YOU REMOVE THE GRAPH IN QUESTION ON EVERY JANUSGRAPH NODE IN YOUR CLUSTER.

You can do so by calling:

ConfiguredGraphFactory.close("graph2");

Since graphs created using the template configuration first create a configuration for that graph in question using a copy and create method, this means that:

ANY UPDATES TO A SPECIFIC GRAPH CREATED USING THE TEMPLATE CONFIGURATION ARE NOT GUARANTEED TO TAKE EFFECT ON THE SPECIFIC GRAPH UNTIL:

1. The relevant configuration is removed: `ConfigurationGraphManagement.getInstance().removeConfiguration("graph2");
2. The graph in question has been closed on every JanusGraph node: `ConfiguredGraphFactory.close("graph2");
3. The graph is recreated using the template configuration: `ConfiguredGraphFactory.create("graph2");

Update Examples

  1. We migrated our Cassandra data to a new server with a new ipaddress:
Map<String, Object> map = new HashMap<String, Object>();
map.put("storage.backend", "cassandrathrift");
map.put("storage.hostname", "127.0.0.1");
map.put("graph.graphname", "graph1"); 
ConfigurationGraphManagement.getInstance().createConfiguration(new MapConfiguration(map));

def g1 = ConfiguredGraphFactory.open("graph1");

// Update configuration
Map<String, Object> map = new HashMap<String, Object>();
map.put("storage.hostname", "10.0.0.1");
ConfigurationGraphManagement.getInstance().updateConfiguration("graph1", map);

// Close graph
ConfiguredGraphFactory.close("graph1");

// We are now guaranteed to use the updated configuration
def g1 = ConfiguredGraphFactory.open("graph1");
  1. We added an elasticsearch node to our setup:
Map<String, Object> map = new HashMap<String, Object>();
map.put("storage.backend", "cassandrathrift");
map.put("storage.hostname", "127.0.0.1");
map.put("graph.graphname", "graph1"); 
ConfigurationGraphManagement.getInstance().createConfiguration(new MapConfiguration(map));

def g1 = ConfiguredGraphFactory.open("graph1");

// Update configuration
Map<String, Object> map = new HashMap<String, Object>();
map.put("index.search.backend", "elasticsearch");
map.put("index.search.hostname", "127.0.0.1");
map.put("index.search.elasticsearch.transport-scheme", "http");
ConfigurationGraphManagement.getInstance().updateConfiguration("graph1", map);

// Close graph
ConfiguredGraphFactory.close("graph1");

// We are now guaranteed to use the updated configuration
def g1 = ConfiguredGraphFactory.open("graph1");
  1. Update a graph configuration that was created using a template configuration that has been updated:
Map<String, Object> map = new HashMap<String, Object>();
map.put("storage.backend", "cassandrathrift");
map.put("storage.hostname", "127.0.0.1");
ConfigurationGraphManagement.getInstance().createTemplateConfiguration(new MapConfiguration(map));

def g1 = ConfiguredGraphFactory.create("graph1");
 
// Update template configuration
Map<String, Object> map = new HashMap<String, Object>();
map.put("index.search.backend", "elasticsearch");
map.put("index.search.hostname", "127.0.0.1");
map.put("index.search.elasticsearch.transport-scheme", "http");
ConfigurationGraphManagement.getInstance().updateTemplateConfiguration(new MapConfiguration(map));

// Remove Configuration
ConfigurationGraphManagement.getInstance().removeConfiguration("graph1");

// Close graph on all JanusGraph nodes
ConfiguredGraphFactory.close("graph1");

// Recreate
ConfiguredGraphFactory.create("graph1");
// Now this graph's configuration is guaranteed to be updated

JanusGraphManager

The JanusGraphManager is a Singleton adhering to the TinkerPop graphManager specifications.

In particular, the JanusGraphManager provides

  1. a coordinated mechanism by which to instantiate graph references on a given JanusGraph node
  2. a graph reference tracker (or cache)

Any graph you create using the "graph.graphname" property will go through the JanusGraphManager and thus be instantiated in a coordinated fashion. The graph reference will also be placed in the graph cache on the JVM in question.

Thus, any graph you open using the "graph.graphname" property that has already been instantiated on the JVM in question will be retrieved from the graph cache.

This is why updates to your configurations require a few steps to guarantee correctness.

graph.graphname

This is a new configuration option you can use when defining a property in your configuration that defines how to access a graph. All configurations that include this property will result in the graph instantiation happening through the JanusGraphManager (process explained above).

For backwards compatability, any graphs that do not supply this parameter but supplied at server start in your graphs {} object in your .yaml file, these graphs will be bound through the JanusGraphManager denoted by their key supplied for that graph. For example, if your .yaml graphs object looks like:

graphs {
  graph1: conf/graph1.properties,
  graph2: conf/graph2.properties
}

but conf/graph1.properties and conf/graph2.properties do not include the property graph.graphname, then these graphs will be stored in the JanusGraphManager and thus bound in your gremlin script executions as graph1 and graph2, respectively.

Important

For convenience, if your configuration used to open a graph specifies "graph.graphname", but does not specify the backend's storage directory, tablename, or keyspacename, then the relevant parameter will automatically be set to the value of "graph.graphname". However, if you supply one of those parameters, that value will always take precedence. And if you supply neither, they default to the configuration option's default value.

One special case is storage.root configuration option. This is a new configuration option used to specify the base of the directory that will be used for any backend requiring local storage directory access. If you supply this parameter, you must also supply the "graph.graphname" property, and the absolute storage directory will be equal to the value of the "graph.graphname" property appended to the value of the "storage.root" property.

Use Cases

  1. Create a template configuration for my Cassandra backend such that each graph created using this configuration gets a unique keyspace equivalent to the String provided to the factory:
 Map<String, Object> map = new HashMap<String, Object>();
 map.put("storage.backend", "cassandrathrift"); 
 map.put("storage.hostname", "127.0.0.1"); 
 ConfigurationGraphManagement.getInstance().createTemplateConfiguration(new MapConfiguration(map));
 
 def g1 = ConfiguredGraphFactory.create("graph1"); //keyspace === graph1
 def g2 = ConfiguredGraphFactory.create("graph2"); //keyspace === graph2
 def g3 = ConfiguredGraphFactory.create("graph3"); //keyspace === graph3
  1. Create a template configuration for my BerkeleyJE backend such that each graph created using this configuration gets a unique storage directory equivalent to the "<storage.root>/<graph.graphname>":
 Map<String, Object> map = new HashMap<String, Object>();
 map.put("storage.backend", "berkeleyje"); 
 map.put("storage.root", "/tmp/graphs"); 
 ConfigurationGraphManagement.getInstance().createTemplateConfiguration(new MapConfiguration(map));
 
 def g1 = ConfiguredGraphFactory.create("graph1"); //storage directory === /tmp/graphs/graph1
 def g2 = ConfiguredGraphFactory.create("graph2"); //storage directory === /tmp/graphs/graph2
 def g3 = ConfiguredGraphFactory.create("graph3"); //storage directory === /tmp/graphs/graph3

Examples

gremlin> :> ConfiguredGraphFactory.open("graph");
Please create configuration for this graph using the ConfigurationGraphManagement API.

gremlin> :> ConfiguredGraphFactory.create("graph");
Please create a template Configuration using the ConfigurationGraphManagement API.

gremlin> :> Map<String, Object> map = new HashMap<String, Object>(); map.put("storage.backend", "cassandrathrift"); map.put("storage.hostname", "127.0.0.1"); map.put("GraphName", "graph1"); ConfigurationGraphManagement.getInstance().createConfiguration(new MapConfiguration(map));
Please include in your configuration the property "graph.graphname".

gremlin> :> Map<String, Object> map = new HashMap<String, Object>(); map.put("storage.backend", "cassandrathrift"); map.put("storage.hostname", "127.0.0.1"); map.put("graph.graphname", "graph1"); ConfigurationGraphManagement.getInstance().createConfiguration(new MapConfiguration(map));
==>null

gremlin> :> ConfiguredGraphFactory.open("graph1").vertices();

gremlin> :> Map<String, Object> map = new HashMap<String, Object>(); map.put("storage.backend", "cassandrathrift"); map.put("storage.hostname", "127.0.0.1"); map.put("graph.graphname", "graph1"); ConfigurationGraphManagement.getInstance().createTemplateConfiguration(new MapConfiguration(map));
Your template configuration may not contain the property "graph.graphname".

gremlin> :> Map<String, Object> map = new HashMap<String, Object>(); map.put("storage.backend", "cassandrathrift"); map.put("storage.hostname", "127.0.0.1"); ConfigurationGraphManagement.getInstance().createTemplateConfiguration(new MapConfiguration(map));
==>null

// Each graph is now acting in unique keyspaces equivalent to the graphnames.
gremlin> :> def g1 = ConfiguredGraphFactory.open("graph1"); def g2 = ConfiguredGraphFactory.create("graph2"); def g3 = ConfiguredGraphFactory.create("graph3"); g2.addVertex(); l = []; l << g1.vertices().size(); l << g2.vertices().size(); l << g3.vertices().size(); l;
==>0
==>1
==>0

// After a graph is created, you must access it using .open()
gremlin> :> def g2 = ConfiguredGraphFactory.create("graph2"); g2.vertices().size();
Configuration for graph "graph2" already exists.

gremlin> :> def g2 = ConfiguredGraphFactory.open("graph2"); g2.vertices().size();
==>1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment