jonathanmorgan/README.MD

## README.MD

      
    Raw
  

              README.MD
            
          
    Exporting a Networkx graph as a Cypher query

This little project defines a function that can be used to construct a Cypher query which when executed against a Neo4j
database server will store the graph to the server.
Background


A Graph is an abstract mathematical model composed of Nodes connected through Edges that can be used to describe complex systems composed of a set of parts (corresponding to nodes) and their connections (corresponding to edges).
Examples of graphs are road networks (junctions connected via roads), electronic circuit networks (components and their connections) and others
Networkx is an excellent Python module for manipulating such Graph objects of any kind.
Neo4j is a graph database. It uses the Graph as a data model to store such objects to a data store.
Cypher is Neo4j's query language. It is a domain specific language that can be used to manipulate graph objects.

Objectives

Given a graph (G) write a function that creates the query to store the graph with all of its nodes, edges and attributes.
How is it done?


By traversing all nodes and edges and creating the corresponding parts of the Cypher query.

Assumptions, Requirements, Caveats


graph2Cypher requires the random, Networkx modules.
The graph2Cypher_demo.py requires networkx, matplotlib
The graph2Cypher function assumes that its (only) parameter IS A DIRECTED GRAPH.
Simply going through all nodes and edges and dumping their attributes is not practical for all graphs because the node-id used by Networkx might not be usable by Neo4j directly. The typical example is a graph whose Networkx node-ids are integers.
For this reason and just for the needs of constructing the Cypher query, the graph's nodes get relabeled on the fly.
Furthermore, certain assumptions are made on attribute names. Each node's id is identified by the ID node attribute, while edges are getting the type ":LINKED_TO" by default.

Use


Obviously, the function can be used in stand-alone mode to create the query that can then be sent to the neo4j database through something like the Python REST interface or the Neo4j-shell.


In the case of the Neo4j-shell, assuming that you have it to your system path, you can simply do the following:

python graph2Cypher_demo.py>aGraph.cypher #This creates the text file with the Cypher query
neo4j-shell -file aGraph.cypher #This will execute the query within aGraph.cypher and store the graph to the database.


## graph2Cypher.py
"""Defines a function that parses a Networkx graph and produces a Cypher query to store the graph in a Neo4j graph database
Athanasios Anastasiou 28/07/2013
"""

include random
include networkx

#Simple character lists
letDCT = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
numDCT = "0123456789"

def getRndTag(someLen, dct=letDCT):
    """Returns some random string of length someLen composed of the characters in the dct string"""
    return "".join([dct[random.randint(0,len(dct)-1)] for i in range(0,someLen)])

def graph2Cypher(aGraph):
    """Generates a Cypher query from a Networkx Graph"""
    nodeStatements = {}
    edgeStatements = []

    #Partially generate the node representations
    for aNode in G.nodes(data = True):
        #Generate a node identifier for Cypher
        varName = getRndTag(2)+getRndTag(2,dct=numDCT)
        #Append the node's ID attribute so that the node-ID information used by Networkx is preserved.
        nodeItems = [("ID","%s" % aNode[0])]
        nodeItems.extend(aNode[1].items())
        #Create the key-value representation of the node's attributes taking care to add quotes when the value is of type string
        nodeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,nodeItems))
        #Store it to a dictionary indexed by the node-id.
        nodeStatements[aNode[0]] = [varName, "(%s %s)" % (varName, nodeAttributes)]

    #Generate the relationship representations
    for anEdge in G.edges(data = True):
        edgeItems = anEdge[2].items()
        edgeAttributes = ""
        if len(edgeItems)>0:
            edgeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,edgeItems))
        #NOTE: Declare the links by their Cypher node-identifier rather than their Networkx node identifier
        edgeStatements.append("(%s)-[:LINKED_TO %s]->(%s)" % (nodeStatements[anEdge[0]][0], edgeAttributes, nodeStatements[anEdge[1]][0]))

    #Put both definitions together and return the create statement.
    return "create %s,%s;\n" % (",".join(map(lambda x:x[1][1],nodeStatements.items())),",".join(edgeStatements))

## graph2Cypher_demo.py
"""Creates a dummy tree graph example and from that, its Cypher representation"""

import networkx
import sys
import random
import graph2Cypher


#Create a DIRECTED network (In this case a simple binary tree (branching factor=2) having 17 nodes)
G =  networkx.generators.full_rary_tree(2,17, create_using=networkx.DiGraph())

#Add some attributes to the nodes
for aNode in G.nodes():
    G.node[aNode]['label'] = graph2Cypher.getRndTag(5)
    G.node[aNode]['cost'] = random.randint(0,9)

#Add some attributes to the edges
#(Note: Here, 'diameter' could refer to pipe diameter, it's just a dummy name.)
for anEdge in G.edges():
    G.edge[anEdge[0]][anEdge[1]].update({'diameter':random.randint(0,9)})

#Write the output to the standard output (this way the query could be piped if required)
sys.stdout.write(graph2Cypher.graph2Cypher(G))
	"""Defines a function that parses a Networkx graph and produces a Cypher query to store the graph in a Neo4j graph database
	Athanasios Anastasiou 28/07/2013
	"""

	include random
	include networkx

	#Simple character lists
	letDCT = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
	numDCT = "0123456789"

	def getRndTag(someLen, dct=letDCT):
	"""Returns some random string of length someLen composed of the characters in the dct string"""
	return "".join([dct[random.randint(0,len(dct)-1)] for i in range(0,someLen)])

	def graph2Cypher(aGraph):
	"""Generates a Cypher query from a Networkx Graph"""
	nodeStatements = {}
	edgeStatements = []

	#Partially generate the node representations
	for aNode in G.nodes(data = True):
	#Generate a node identifier for Cypher
	varName = getRndTag(2)+getRndTag(2,dct=numDCT)
	#Append the node's ID attribute so that the node-ID information used by Networkx is preserved.
	nodeItems = [("ID","%s" % aNode[0])]
	nodeItems.extend(aNode[1].items())
	#Create the key-value representation of the node's attributes taking care to add quotes when the value is of type string
	nodeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,nodeItems))
	#Store it to a dictionary indexed by the node-id.
	nodeStatements[aNode[0]] = [varName, "(%s %s)" % (varName, nodeAttributes)]

	#Generate the relationship representations
	for anEdge in G.edges(data = True):
	edgeItems = anEdge[2].items()
	edgeAttributes = ""
	if len(edgeItems)>0:
	edgeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,edgeItems))
	#NOTE: Declare the links by their Cypher node-identifier rather than their Networkx node identifier
	edgeStatements.append("(%s)-[:LINKED_TO %s]->(%s)" % (nodeStatements[anEdge[0]][0], edgeAttributes, nodeStatements[anEdge[1]][0]))

	#Put both definitions together and return the create statement.
	return "create %s,%s;\n" % (",".join(map(lambda x:x[1][1],nodeStatements.items())),",".join(edgeStatements))
	"""Creates a dummy tree graph example and from that, its Cypher representation"""

	import networkx
	import sys
	import random
	import graph2Cypher


	#Create a DIRECTED network (In this case a simple binary tree (branching factor=2) having 17 nodes)
	G = networkx.generators.full_rary_tree(2,17, create_using=networkx.DiGraph())

	#Add some attributes to the nodes
	for aNode in G.nodes():
	G.node[aNode]['label'] = graph2Cypher.getRndTag(5)
	G.node[aNode]['cost'] = random.randint(0,9)

	#Add some attributes to the edges
	#(Note: Here, 'diameter' could refer to pipe diameter, it's just a dummy name.)
	for anEdge in G.edges():
	G.edge[anEdge[0]][anEdge[1]].update({'diameter':random.randint(0,9)})

	#Write the output to the standard output (this way the query could be piped if required)
	sys.stdout.write(graph2Cypher.graph2Cypher(G))