Last active
January 27, 2020 13:18
-
-
Save aravindpai/c02310a69b45c8193b34e078e577e7ec to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#initialize glove embeddings | |
TEXT.build_vocab(train_data,min_freq=3,vectors = "glove.6B.100d") | |
LABEL.build_vocab(train_data) | |
#No. of unique tokens in text | |
print("Size of TEXT vocabulary:",len(TEXT.vocab)) | |
#No. of unique tokens in label | |
print("Size of LABEL vocabulary:",len(LABEL.vocab)) | |
#Commonly used words | |
print(TEXT.vocab.freqs.most_common(10)) | |
#Word dictionary | |
print(TEXT.vocab.stoi) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment