Skip to content

Instantly share code, notes, and snippets.

@quaquel
Created September 3, 2014 12:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save quaquel/446ddcefc41f5a59117d to your computer and use it in GitHub Desktop.
Save quaquel/446ddcefc41f5a59117d to your computer and use it in GitHub Desktop.
chase data community detection
{
"metadata": {
"name": "",
"signature": "sha256:8ff808735cea661eeaf27f457e5633a14d4cb5248896790d9217821d621b9873"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"importing the network is easy, we can in 2 lines get a weighted graph representaiton of the TSV file. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import networkx as nx\n",
"\n",
"fn = \"../Jan_TextMining/src/gtm_2014/data/ALLSEEDS_coauthorship_network.tsv\"\n",
"G = nx.read_edgelist(fn, create_using=nx.Graph(), data=(('weight',float),))\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# I have not properly installed the community library, so it is not by default on the python path\n",
"# I add the directory to the path and then succesfully import the community detection library\n",
"\n",
"import sys\n",
"sys.path.append('/Domain/tudelft.net/Users/jhkwakkel/Documents/workspace/EMAProjects/Jan_TextMining/src/gtm_2014/')\n",
"del sys\n",
"import community"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dendo = community.generate_dendrogram(G)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(dendo)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 22,
"text": [
"3"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The algorithm finds 3 layers of community in the network. Let's investigate the modularity score for each of these levels"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for i in range(len(dendo)):\n",
" partition = community.partition_at_level(dendo, i)\n",
" modularity = community.modularity(partition, G)\n",
" print \"level {}: {}\".format(i, modularity)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"level 0: 0.369598090194\n",
"level 1: 0.391273331113"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"level 2: 0.391710185879"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 24
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The algorithm finds strong evidence of community structure in the data. It also appears that the distinction between level 1 and level 2 is minor. This was also the case in my data. It might be interesting to investigate the community structure at both level 0 and level 2. Below, I demonstrate how we can do this for level 2."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from collections import defaultdict\n",
"\n",
"partition = community.partition_at_level(dendo, 2)\n",
"communities = defaultdict(list)\n",
"for key, value in partition.iteritems():\n",
" communities[value].append(key)\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 34
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for key, value in communities.iteritems():\n",
" print \"community {}, nr. of items {}\".format(key, len(value))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"community 0, nr. of items 192\n",
"community 1, nr. of items 309\n",
"community 2, nr. of items 359\n",
"community 3, nr. of items 131\n",
"community 4, nr. of items 74\n",
"community 5, nr. of items 102\n",
"community 6, nr. of items 353\n",
"community 7, nr. of items 60\n",
"community 8, nr. of items 2\n",
"community 9, nr. of items 5\n",
"community 10, nr. of items 14\n",
"community 11, nr. of items 2\n"
]
}
],
"prompt_number": 35
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, there are 11 communities at level 2. 3 of these are realy small and contain 2, 2, and 5 nodes respectively. The other networks are more substantive. \n",
"\n",
"Given the geolocations of the node, vizualization on a globe is straightforward. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment