Skip to content

Instantly share code, notes, and snippets.

@talesa
Last active April 4, 2019 22:21
Show Gist options
  • Save talesa/b679882ff23114939dab674f4c322b8a to your computer and use it in GitHub Desktop.
Save talesa/b679882ff23114939dab674f4c322b8a to your computer and use it in GitHub Desktop.
Code for Ewa Siwicka 04/04/2019
filename = 'Network-publishing-database2_short2.csv'
import csv
import itertools
import numpy as np
with open(filename, newline='') as csvfile:
rows = csv.reader(csvfile)
rows = [[i for i in row if i!=''] for row in rows]
def flatten(l):
return [item for sublist in l for item in sublist]
id_to_name = list(set(flatten(rows)))
name_to_id = {name: id for id, name in enumerate(id_to_name)}
N = len(id_to_name)
m = np.zeros((N, N), dtype=np.int)
for row in rows:
for a, b in itertools.combinations(row, 2):
m[name_to_id[a], name_to_id[b]] += 1
m[name_to_id[b], name_to_id[a]] += 1
with open('pairs_output.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile)
for i in range(N):
for j in range(i+1):
for _ in range(m[i,j]):
spamwriter.writerow([id_to_name[i], id_to_name[j]])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment