Skip to content

Instantly share code, notes, and snippets.

@joshua-taylor
Created October 10, 2020 11:20
Show Gist options
  • Save joshua-taylor/72f20c1999f718ac8b5865bd3003bec9 to your computer and use it in GitHub Desktop.
Save joshua-taylor/72f20c1999f718ac8b5865bd3003bec9 to your computer and use it in GitHub Desktop.
weighted_doc_vects = []
for i,doc in tqdm(enumerate(tok_text)):
doc_vector = []
for word in doc:
vector = ft_model[word]
weight = (bm25.idf[word] * ((bm25.k1 + 1.0)*bm25.doc_freqs[i][word]))
/
(bm25.k1 * (1.0 - bm25.b + bm25.b *(bm25.doc_len[i]/bm25.avgdl))+bm25.doc_freqs[i][word])
weighted_vector = vector * weight
doc_vector.append(weighted_vector)
doc_vector_mean = np.mean(doc_vector,axis=0)
weighted_doc_vects.append(doc_vector_mean)
pickle.dump( weighted_doc_vects, open( "weighted_doc_vects.p", "wb" ) ) #save the results to disc
@bigfoot504
Copy link

It looks like this may need to be updated to replace ft_model[word] with ft_model.wv[word].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment