Skip to content

Instantly share code, notes, and snippets.

@Orbifold
Last active March 26, 2020 15:46
Show Gist options
  • Save Orbifold/621b55e7556c4d964b0d0d80117222a1 to your computer and use it in GitHub Desktop.
Save Orbifold/621b55e7556c4d964b0d0d80117222a1 to your computer and use it in GitHub Desktop.
Untitled.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# CSV generator for yFiles ETL\n",
"\n",
"The code below produces a CSV file you can use with the ETL designer.\n",
"The graph contained in the CSV is based on the [Barabasi-Albert algorithm](https://en.wikipedia.org/wiki/Barab%C3%A1si%E2%80%93Albert_model) but any of [the algorithms in NetworkX](https://networkx.github.io/documentation/networkx-1.9.1/reference/generators.html) will do.\n",
"\n",
"The [Faker](https://github.com/joke2k/faker) package is used to generate random data.\n",
"\n",
"Of course you need to have Python (v3.6+) installed and the following installs the necessary packages\n",
"\n",
"`pip install networkx`\n",
"`pip install faker`\n"
],
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "code",
"source": [
"import networkx as nx\n",
"import csv, copy\n",
"import matplotlib.pyplot as plt\n",
"from faker import Faker\n",
"faker = Faker()\n"
],
"outputs": [],
"execution_count": 16,
"metadata": {
"collapsed": false,
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
},
"execution": {
"iopub.status.busy": "2020-03-26T15:32:37.332Z",
"iopub.execute_input": "2020-03-26T15:32:37.335Z",
"iopub.status.idle": "2020-03-26T15:32:37.392Z",
"shell.execute_reply": "2020-03-26T15:32:37.394Z"
}
}
},
{
"cell_type": "markdown",
"source": [
"The following generates a random graph:"
],
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "code",
"source": [
"N = 50\n",
"ba = nx.barabasi_albert_graph(N, 5)"
],
"outputs": [],
"execution_count": 6,
"metadata": {
"collapsed": false,
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
},
"execution": {
"iopub.status.busy": "2020-03-26T15:24:31.317Z",
"iopub.execute_input": "2020-03-26T15:24:31.321Z",
"iopub.status.idle": "2020-03-26T15:24:31.326Z",
"shell.execute_reply": "2020-03-26T15:24:31.330Z"
}
}
},
{
"cell_type": "markdown",
"source": [
"The graph generated needs to be augmented with some data:"
],
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "code",
"source": [
"def createNode(i):\n",
" return {\n",
" \"id\": str(i),\n",
" \"firstName\": faker.first_name(),\n",
" \"lastName\": faker.last_name()\n",
" }\n",
"nodes = [createNode(i) for i in range(N)]\n"
],
"outputs": [],
"execution_count": 21,
"metadata": {
"collapsed": false,
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
},
"execution": {
"iopub.status.busy": "2020-03-26T15:35:21.584Z",
"iopub.execute_input": "2020-03-26T15:35:21.588Z",
"iopub.status.idle": "2020-03-26T15:35:21.593Z",
"shell.execute_reply": "2020-03-26T15:35:21.598Z"
}
}
},
{
"cell_type": "markdown",
"source": [
"In order to use a tabel structure as a way to store a graph structure we inevitably induce denormalization of the data. Each and every time we define an edge from a node we need to re-use the same node information:"
],
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
}
},
{
"cell_type": "code",
"source": [
"rows = []\n",
"for edge in list(ba.edges):\n",
" [sourceId, targetId] = edge\n",
" obj = copy.copy(nodes[sourceId])\n",
" obj[\"target\"] = str(targetId)\n",
" rows.append(obj)"
],
"outputs": [],
"execution_count": 22,
"metadata": {
"collapsed": false,
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
},
"execution": {
"iopub.status.busy": "2020-03-26T15:35:23.974Z",
"iopub.execute_input": "2020-03-26T15:35:23.977Z",
"iopub.status.idle": "2020-03-26T15:35:23.982Z",
"shell.execute_reply": "2020-03-26T15:35:23.984Z"
}
}
},
{
"cell_type": "code",
"source": [
"\n",
"with open(\"/data/barabsi.csv\", \"wt\") as f:\n",
" w = csv.writer(f)\n",
"\n",
" # Write CSV Header, If you dont need that, remove this line\n",
" w.writerow([\"id\", \"firstName\", \"lastName\", \"target\"])\n",
"\n",
" for x in rows:\n",
" w.writerow([x[\"id\"],\n",
" x[\"firstName\"],\n",
" x[\"lastName\"],\n",
" x[\"target\"]])"
],
"outputs": [],
"execution_count": 23,
"metadata": {
"collapsed": false,
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
},
"execution": {
"iopub.status.busy": "2020-03-26T15:35:26.279Z",
"iopub.execute_input": "2020-03-26T15:35:26.281Z",
"iopub.status.idle": "2020-03-26T15:35:26.285Z",
"shell.execute_reply": "2020-03-26T15:35:26.287Z"
}
}
},
{
"cell_type": "code",
"source": [],
"outputs": [],
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"source_hidden": false,
"outputs_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
}
}
],
"metadata": {
"kernel_info": {
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.7.2",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"kernelspec": {
"argv": [
"/Users/swa/conda/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment