Skip to content

Instantly share code, notes, and snippets.

@ashunigion
Created June 11, 2019 15:20
Show Gist options
  • Save ashunigion/2316ebb0ba5355ce12c74353039124dc to your computer and use it in GitHub Desktop.
Save ashunigion/2316ebb0ba5355ce12c74353039124dc to your computer and use it in GitHub Desktop.
Tokenizing the reviews for sentiment analysis
# feel free to use this import
from collections import Counter
temp = Counter(words)
temp = temp.most_common()
## Build a dictionary that maps words to integers
vocab_to_int = {}
i = 1
for pair in temp:
vocab_to_int.update({pair[0]:i})
i+=1
## use the dict to tokenize each review in reviews_split
## store the tokenized reviews in reviews_ints
reviews_ints = []
for review in reviews_split:
word_list = review.split()
num_list = []
for word in word_list:
num_list.append(vocab_to_int[word])
reviews_ints.append(num_list)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment