Skip to content

Instantly share code, notes, and snippets.

@tammoippen
Last active September 19, 2017 12:20
Show Gist options
  • Save tammoippen/991008387a67cd2a0b017dbc101e4886 to your computer and use it in GitHub Desktop.
Save tammoippen/991008387a67cd2a0b017dbc101e4886 to your computer and use it in GitHub Desktop.
Compute the inverse document frequency
from __future__ import division
from collections import Counter
import math
def idf(documents):
freq = Counter()
for doc in documents:
words = set(doc.split())
freq.update(words)
res = dict()
for w, f in freq.items():
res[w] = math.log(len(documents) / f)
return res
@tammoippen
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment