Skip to content

Instantly share code, notes, and snippets.

@thinrhino
Created May 5, 2014 13:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thinrhino/84cd4c9f37703da84cb9 to your computer and use it in GitHub Desktop.
Save thinrhino/84cd4c9f37703da84cb9 to your computer and use it in GitHub Desktop.
Plot zipf's law
from collections import defaultdict
import matplotlib.pyplot as plt
data = open('<data_file>', 'r')
r_data = []
# reading relevant data
while True:
l = data.readline()
if l == '':
break
words = l.split(' ')
wc = (words[1], int(words[2].strip()))
r_data.append(wc)
d = defaultdict(int)
for k, v in r_data:
d[k] += v
# sort the list of frequencies in decreasing order
freqs = d.values()
freqs.sort(reverse=True)
# enumerate the ranks and frequencies
rf = [(r+1, f) for r, f in enumerate(freqs)]
rs, fs = zip(*rf)
plt.clf()
plt.xscale('log')
plt.yscale('log')
plt.title('Zipf plot')
plt.xlabel('rank')
plt.ylabel('frequency')
plt.plot(rs, fs, 'r-')
plt.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment