Skip to content

Instantly share code, notes, and snippets.

@wut0n9
Created January 9, 2019 11:48
Show Gist options
  • Save wut0n9/a0cefc823832e3f7aaa45ece08ceace6 to your computer and use it in GitHub Desktop.
Save wut0n9/a0cefc823832e3f7aaa45ece08ceace6 to your computer and use it in GitHub Desktop.
"""Calculates frequencies of terms in documents and in corpus. Also computes inverse document frequencies."""
for document in self.corpus:
frequencies = {}
self.doc_len.append(len(document))
for word in document:
if word not in frequencies:
frequencies[word] = 0
frequencies[word] += 1
self.f.append(frequencies)
for word, freq in iteritems(frequencies):
if word not in self.df:
self.df[word] = 0
self.df[word] += 1
for word, freq in iteritems(self.df):
self.idf[word] = math.log(self.corpus_size - freq + 0.5) - math.log(freq + 0.5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment