Skip to content

Instantly share code, notes, and snippets.

@Yuktha-Majella
Created August 7, 2021 16:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Yuktha-Majella/54d0087aa57bfae81a1531a607edbc5c to your computer and use it in GitHub Desktop.
Save Yuktha-Majella/54d0087aa57bfae81a1531a607edbc5c to your computer and use it in GitHub Desktop.
Create dictionary from text file in Gensim
from gensim.utils import simple_preprocess
from gensim import corpora
text2 = open('sample_text.txt', encoding ='utf-8')
tokens2 =[]
for line in text2.read().split('.'):
tokens2.append(simple_preprocess(line, deacc = True))
g_dict2 = corpora.Dictionary(tokens2)
print("The dictionary has: " +str(len(g_dict2)) + " tokens\n")
print(g_dict2.token2id)
NLP is a branch of data science that consists of systematic processes for analyzing, understanding, and deriving information from the text data in a smart and efficient manner. By utilizing NLP and its components, one can organize the massive chunks of text data, perform numerous automated tasks and solve a wide range of problems such as – automatic summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment