Skip to content

Instantly share code, notes, and snippets.

@aanastasiou
Created July 28, 2013 18:32
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save aanastasiou/6099561 to your computer and use it in GitHub Desktop.
Save aanastasiou/6099561 to your computer and use it in GitHub Desktop.
Generate a Cypher query to store a Python Networkx directed graph

Exporting a Networkx graph as a Cypher query

This little project defines a function that can be used to construct a Cypher query which when executed against a Neo4j database server will store the graph to the server.

Background

  • A Graph is an abstract mathematical model composed of Nodes connected through Edges that can be used to describe complex systems composed of a set of parts (corresponding to nodes) and their connections (corresponding to edges).
  • Examples of graphs are road networks (junctions connected via roads), electronic circuit networks (components and their connections) and others
  • Networkx is an excellent Python module for manipulating such Graph objects of any kind.
  • Neo4j is a graph database. It uses the Graph as a data model to store such objects to a data store.
  • Cypher is Neo4j's query language. It is a domain specific language that can be used to manipulate graph objects.

Objectives

Given a graph (G) write a function that creates the query to store the graph with all of its nodes, edges and attributes.

How is it done?

  • By traversing all nodes and edges and creating the corresponding parts of the Cypher query.

Assumptions, Requirements, Caveats

  • graph2Cypher requires the random, Networkx modules.
  • The graph2Cypher_demo.py requires networkx, matplotlib
  • The graph2Cypher function assumes that its (only) parameter IS A DIRECTED GRAPH.
  • Simply going through all nodes and edges and dumping their attributes is not practical for all graphs because the node-id used by Networkx might not be usable by Neo4j directly. The typical example is a graph whose Networkx node-ids are integers.
  • For this reason and just for the needs of constructing the Cypher query, the graph's nodes get relabeled on the fly.
  • Furthermore, certain assumptions are made on attribute names. Each node's id is identified by the ID node attribute, while edges are getting the type ":LINKED_TO" by default.

Use

  • Obviously, the function can be used in stand-alone mode to create the query that can then be sent to the neo4j database through something like the Python REST interface or the Neo4j-shell.

  • In the case of the Neo4j-shell, assuming that you have it to your system path, you can simply do the following:

    python graph2Cypher_demo.py>aGraph.cypher #This creates the text file with the Cypher query neo4j-shell -file aGraph.cypher #This will execute the query within aGraph.cypher and store the graph to the database.

"""Defines a function that parses a Networkx graph and produces a Cypher query to store the graph in a Neo4j graph database
Athanasios Anastasiou 28/07/2013
"""
include random
include networkx
#Simple character lists
letDCT = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
numDCT = "0123456789"
def getRndTag(someLen, dct=letDCT):
"""Returns some random string of length someLen composed of the characters in the dct string"""
return "".join([dct[random.randint(0,len(dct)-1)] for i in range(0,someLen)])
def graph2Cypher(aGraph):
"""Generates a Cypher query from a Networkx Graph"""
nodeStatements = {}
edgeStatements = []
#Partially generate the node representations
for aNode in G.nodes(data = True):
#Generate a node identifier for Cypher
varName = getRndTag(2)+getRndTag(2,dct=numDCT)
#Append the node's ID attribute so that the node-ID information used by Networkx is preserved.
nodeItems = [("ID","%s" % aNode[0])]
nodeItems.extend(aNode[1].items())
#Create the key-value representation of the node's attributes taking care to add quotes when the value is of type string
nodeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,nodeItems))
#Store it to a dictionary indexed by the node-id.
nodeStatements[aNode[0]] = [varName, "(%s %s)" % (varName, nodeAttributes)]
#Generate the relationship representations
for anEdge in G.edges(data = True):
edgeItems = anEdge[2].items()
edgeAttributes = ""
if len(edgeItems)>0:
edgeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,edgeItems))
#NOTE: Declare the links by their Cypher node-identifier rather than their Networkx node identifier
edgeStatements.append("(%s)-[:LINKED_TO %s]->(%s)" % (nodeStatements[anEdge[0]][0], edgeAttributes, nodeStatements[anEdge[1]][0]))
#Put both definitions together and return the create statement.
return "create %s,%s;\n" % (",".join(map(lambda x:x[1][1],nodeStatements.items())),",".join(edgeStatements))
"""Creates a dummy tree graph example and from that, its Cypher representation"""
import networkx
import sys
import random
import graph2Cypher
#Create a DIRECTED network (In this case a simple binary tree (branching factor=2) having 17 nodes)
G = networkx.generators.full_rary_tree(2,17, create_using=networkx.DiGraph())
#Add some attributes to the nodes
for aNode in G.nodes():
G.node[aNode]['label'] = graph2Cypher.getRndTag(5)
G.node[aNode]['cost'] = random.randint(0,9)
#Add some attributes to the edges
#(Note: Here, 'diameter' could refer to pipe diameter, it's just a dummy name.)
for anEdge in G.edges():
G.edge[anEdge[0]][anEdge[1]].update({'diameter':random.randint(0,9)})
#Write the output to the standard output (this way the query could be piped if required)
sys.stdout.write(graph2Cypher.graph2Cypher(G))
@jexp
Copy link

jexp commented Jul 29, 2013

Perhaps create a real github repo out of this?

Why do you generate a separate node-identifier with random ? and not use the networkx identifier directly (quoted with backticks) ?

@aanastasiou
Copy link
Author

Hello Michael

I guess i can but i did not think that the extent of the contribution justified a repo of its own (perhaps if i gather more related Networkx + Neo4j code it could become a repo of its own)

Regarding the random tags:

The short answer: Yes you are right, this could be done too.

The long answer: Networkx does not really care what the user decides to attach as a "Node". This is perfectly valid:

import networkx

class someThing(object):
    def __init__(self,aParam):
        self.someField = aParam

G = networkx.DiGraph()

G.add_node(someThing('Blah1'), someNodeProperty = 1234)
G.nodes()[0] #Returns the instance of the class or the string representation if someThing has a __repr__

So, if you have full control over someThing, then you could put a repr function in place that derives some sort of unique identifier from the object which could then be used directly even in a Cypher query, escaped, exactly as you suggest.

If you don't have full control over it, you can possibly derive (and add the repr) OR do the trick with generating identifiers.

I did not want to restrict the use of the function too much. If you remove the "ID" part, the node's id would not participate at all.

But thanks anyway, this can be added to the next iteration along with:

  • Pickling the node object properly so that it can instantiated from the database (if required)
  • Recognising if the graph is directed / undirected and reflecting this in how the relationships are established in Cypher
  • Recommending the use of repr for identifiers and optionally reverting to using randomly generated ones.

@ducky427
Copy link

I've written something similar to this here.

@Gijs-Koot
Copy link

Hey, you've mixed up aGraph and G as function arguments, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment