Skip to content

Instantly share code, notes, and snippets.

@dipanjanS
Created March 13, 2018 17:53
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save dipanjanS/84f415ae0781c1d42f6fb0ac843bf63e to your computer and use it in GitHub Desktop.
from keras.preprocessing import text
tokenizer = text.Tokenizer()
tokenizer.fit_on_texts(norm_bible)
word2id = tokenizer.word_index
id2word = {v:k for k, v in word2id.items()}
vocab_size = len(word2id) + 1
embed_size = 100
wids = [[word2id[w] for w in text.text_to_word_sequence(doc)] for doc in norm_bible]
print('Vocabulary Size:', vocab_size)
print('Vocabulary Sample:', list(word2id.items())[:10])
@faizaantrialx
Copy link

norm_bible is not defined

@tararawat
Copy link

norm_bible is not defined

@ganiboy
Copy link

ganiboy commented May 30, 2019

is norm_bibe is list of strings??

@rineesh-iner
Copy link

is norm_bibe is list of strings??

I think norm_bible is the corpus

norm_bible=open('corona.txt','r')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment