Skip to content

Instantly share code, notes, and snippets.

@Ken-Kuroki
Created August 26, 2019 02:39
Show Gist options
  • Save Ken-Kuroki/54dff9f526aac072a4de9cee8293e03b to your computer and use it in GitHub Desktop.
Save Ken-Kuroki/54dff9f526aac072a4de9cee8293e03b to your computer and use it in GitHub Desktop.
Calculate TF-IDF from a count matrix
import numpy as np
from sklearn.preprocessing import normalize
def tf_idf(X): # corresponds to smooth=True and norm="l2" in sklearn.feature_extraction.text.TfidfVectorizer
tf = normalize(X, norm="l1", axis=1)
N = len(X)
df = np.count_nonzero(X, axis=0)
idf = np.log((N+1)/(df+1))+1
return normalize(tf * idf, norm="l2")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment