Skip to content

Instantly share code, notes, and snippets.

@jonathanmorgan
Forked from aanastasiou/README.MD
Created July 1, 2016 12:48
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jonathanmorgan/6049c0914db0fb5854ed27c5f089d577 to your computer and use it in GitHub Desktop.
Save jonathanmorgan/6049c0914db0fb5854ed27c5f089d577 to your computer and use it in GitHub Desktop.
Generate a Cypher query to store a Python Networkx directed graph

Exporting a Networkx graph as a Cypher query

This little project defines a function that can be used to construct a Cypher query which when executed against a Neo4j database server will store the graph to the server.

Background

  • A Graph is an abstract mathematical model composed of Nodes connected through Edges that can be used to describe complex systems composed of a set of parts (corresponding to nodes) and their connections (corresponding to edges).
  • Examples of graphs are road networks (junctions connected via roads), electronic circuit networks (components and their connections) and others
  • Networkx is an excellent Python module for manipulating such Graph objects of any kind.
  • Neo4j is a graph database. It uses the Graph as a data model to store such objects to a data store.
  • Cypher is Neo4j's query language. It is a domain specific language that can be used to manipulate graph objects.

Objectives

Given a graph (G) write a function that creates the query to store the graph with all of its nodes, edges and attributes.

How is it done?

  • By traversing all nodes and edges and creating the corresponding parts of the Cypher query.

Assumptions, Requirements, Caveats

  • graph2Cypher requires the random, Networkx modules.
  • The graph2Cypher_demo.py requires networkx, matplotlib
  • The graph2Cypher function assumes that its (only) parameter IS A DIRECTED GRAPH.
  • Simply going through all nodes and edges and dumping their attributes is not practical for all graphs because the node-id used by Networkx might not be usable by Neo4j directly. The typical example is a graph whose Networkx node-ids are integers.
  • For this reason and just for the needs of constructing the Cypher query, the graph's nodes get relabeled on the fly.
  • Furthermore, certain assumptions are made on attribute names. Each node's id is identified by the ID node attribute, while edges are getting the type ":LINKED_TO" by default.

Use

  • Obviously, the function can be used in stand-alone mode to create the query that can then be sent to the neo4j database through something like the Python REST interface or the Neo4j-shell.

  • In the case of the Neo4j-shell, assuming that you have it to your system path, you can simply do the following:

    python graph2Cypher_demo.py>aGraph.cypher #This creates the text file with the Cypher query neo4j-shell -file aGraph.cypher #This will execute the query within aGraph.cypher and store the graph to the database.

"""Defines a function that parses a Networkx graph and produces a Cypher query to store the graph in a Neo4j graph database
Athanasios Anastasiou 28/07/2013
"""
include random
include networkx
#Simple character lists
letDCT = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
numDCT = "0123456789"
def getRndTag(someLen, dct=letDCT):
"""Returns some random string of length someLen composed of the characters in the dct string"""
return "".join([dct[random.randint(0,len(dct)-1)] for i in range(0,someLen)])
def graph2Cypher(aGraph):
"""Generates a Cypher query from a Networkx Graph"""
nodeStatements = {}
edgeStatements = []
#Partially generate the node representations
for aNode in G.nodes(data = True):
#Generate a node identifier for Cypher
varName = getRndTag(2)+getRndTag(2,dct=numDCT)
#Append the node's ID attribute so that the node-ID information used by Networkx is preserved.
nodeItems = [("ID","%s" % aNode[0])]
nodeItems.extend(aNode[1].items())
#Create the key-value representation of the node's attributes taking care to add quotes when the value is of type string
nodeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,nodeItems))
#Store it to a dictionary indexed by the node-id.
nodeStatements[aNode[0]] = [varName, "(%s %s)" % (varName, nodeAttributes)]
#Generate the relationship representations
for anEdge in G.edges(data = True):
edgeItems = anEdge[2].items()
edgeAttributes = ""
if len(edgeItems)>0:
edgeAttributes = "{%s}" % ",".join(map(lambda x:"%s:%s" %(x[0],x[1]) if not type(x[1])==str else "%s:'%s'" %(x[0],x[1]) ,edgeItems))
#NOTE: Declare the links by their Cypher node-identifier rather than their Networkx node identifier
edgeStatements.append("(%s)-[:LINKED_TO %s]->(%s)" % (nodeStatements[anEdge[0]][0], edgeAttributes, nodeStatements[anEdge[1]][0]))
#Put both definitions together and return the create statement.
return "create %s,%s;\n" % (",".join(map(lambda x:x[1][1],nodeStatements.items())),",".join(edgeStatements))
"""Creates a dummy tree graph example and from that, its Cypher representation"""
import networkx
import sys
import random
import graph2Cypher
#Create a DIRECTED network (In this case a simple binary tree (branching factor=2) having 17 nodes)
G = networkx.generators.full_rary_tree(2,17, create_using=networkx.DiGraph())
#Add some attributes to the nodes
for aNode in G.nodes():
G.node[aNode]['label'] = graph2Cypher.getRndTag(5)
G.node[aNode]['cost'] = random.randint(0,9)
#Add some attributes to the edges
#(Note: Here, 'diameter' could refer to pipe diameter, it's just a dummy name.)
for anEdge in G.edges():
G.edge[anEdge[0]][anEdge[1]].update({'diameter':random.randint(0,9)})
#Write the output to the standard output (this way the query could be piped if required)
sys.stdout.write(graph2Cypher.graph2Cypher(G))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment