This gist provides a D3.js friendly "Coauthorship in Network Science" graph as a JSON file. It is used in the example at:
🔗 https://bl.ocks.org/sjengle/f6f522f3969752b384cfec5449eacd98
Source: M. E. J. Newman, Finding Community Structure in Networks using the Eigenvectors of Matrices, Preprint Physics/0605087 (2006). [Data] [Paper]
The original dataset was retreived from http://www-personal.umich.edu/~mejn/netdata/ under the "Data sets" heading, cited as:
Coauthorships in network science: coauthorship network of scientists working on network theory and experiment, as compiled by M. Newman in May 2006. A figure depicting the largest component of this network can be found here. These data can be cited as M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
The original dataset was processed in Python using the NetworkX package. The original GML files were processed, filtered, and converted into JSON. Specifically, the degree and closeness centrality of each node was calculated. The edge weights were already present in the original file.
The entire network can be found in the netscience.json
file. The largest connected component can be found in the netscience-largest.json
file. The second largest connected component can be found in the netscience-second.json
file. Finally, the netscience-filtered
file contains all components with 5 or more nodes.
The entire script is as follows:
import json
import networkx as nx
from networkx.readwrite import json_graph
# must make sure "[" is after label not on new line in gml file
G = nx.read_gml('netscience.gml')
# calculate some node attributes
nx.set_node_attributes(G, 'degree', G.degree())
nx.set_node_attributes(G, 'closeness', nx.closeness_centrality(G))
# write entire graph to file
with open('netscience.json', 'w') as f:
json.dump(json_graph.node_link_data(G), f, indent = " ")
# find all of the connected components (sorted by largest first)
connected = sorted(nx.connected_components(G), key = len, reverse = True)
# pull out the largest connected component
largest = G.subgraph(connected[0])
with open('netscience-largest.json', 'w') as f:
json.dump(json_graph.node_link_data(largest), f, indent = " ")
# pull out the second largest connected component
second = G.subgraph(connected[1])
with open('netscience-second.json', 'w') as f:
json.dump(json_graph.node_link_data(second), f, indent = " ")
# filter for all components with 5 or more nodes
filtered = [item for item in connected if len(item) >= 5]
filtered = [item for sublist in filtered for item in sublist]
filtered = G.subgraph(filtered)
with open('netscience-filtered.json', 'w') as f:
json.dump(json_graph.node_link_data(filtered), f, indent = " ")
print("Done")