Skip to content

Instantly share code, notes, and snippets.

@milespossing
Created November 30, 2021 00:18
Show Gist options
  • Save milespossing/ec1ac97ecec2eb7e59bc0e8d75e943a1 to your computer and use it in GitHub Desktop.
Save milespossing/ec1ac97ecec2eb7e59bc0e8d75e943a1 to your computer and use it in GitHub Desktop.
Model interpretation using vectorized logistic regression (word list coefficients provided
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
all_data = pd.read_table('alldata.tsv')
dict = {pair[1]['token']:pair[1]['coef'] for pair in all_vocab[['token','coef']].iterrows()}
vectorizer = CountVectorizer(vocabulary=all_vocab['token'].values, max_df=0.3, ngram_range=(1, 2), min_df=20)
cur_vocab = vectorizer.get_feature_names()
def get_tokens(review):
a = vectorizer.transform([review])
nz = np.nonzero(a)[1]
return all_vocab.iloc[nz][['coef','token']]
review = all_data['review'].values[5]
print(review)
o = get_tokens(review)
sns.barplot(x="coef", y="token", data=o);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment