Skip to content

Instantly share code, notes, and snippets.

@ravishchawla
Created March 20, 2020 16:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ravishchawla/b650eda47f16b24db87a51bb0140c396 to your computer and use it in GitHub Desktop.
Save ravishchawla/b650eda47f16b24db87a51bb0140c396 to your computer and use it in GitHub Desktop.
quora_check_coverage
def check_coverage(text, embeddings_dict):
known_words, unknown_words = {}, {};
total_known, total_unknown = 0, 0;
for sentence in text:
for word in sentence.split(' '):
if word in known_words:
total_known = total_known + 1;
elif word in embeddings_dict:
known_words[word] = embeddings_dict[word];
total_known = total_known + 1;
else:
unknown_words[word] = None;
total_unknown = total_unknown + 1;
print('Total coverage of Vocabulary %.2f'%(len(known_words) / len(embeddings_dict)))
print('Total coverage of Dataset %.2f'%(total_known / (total_known + total_unknown)));
return known_words, unknown_words;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment