This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Removing extra spaces | |
df['cleaned']=df['cleaned'].apply(lambda x: re.sub(' +',' ',x)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for index,text in enumerate(df['cleaned'][35:40]): | |
print('Review %d:\n'%(index+1),text) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Importing spacy | |
import spacy | |
# Loading model | |
nlp = spacy.load('en_core_web_sm',disable=['parser', 'ner']) | |
# Lemmatization with stopwords removal | |
df['lemmatized']=df['cleaned'].apply(lambda x: ' '.join([token.lemma_ for token in list(nlp(x)) if (token.is_stop==False)])) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_grouped=df[['name','lemmatized']].groupby(by='name').agg(lambda x:' '.join(x)) | |
df_grouped.head() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Importing wordcloud for plotting word clouds and textwrap for wrapping longer text | |
from wordcloud import WordCloud | |
from textwrap import wrap | |
# Function for generating word clouds | |
def generate_wordcloud(data,title): | |
wc = WordCloud(width=400, height=330, max_words=150,colormap="Dark2").generate_from_frequencies(data) | |
plt.figure(figsize=(10,8)) | |
plt.imshow(wc, interpolation='bilinear') | |
plt.axis("off") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from textblob import TextBlob | |
df['polarity']=df['lemmatized'].apply(lambda x:TextBlob(x).sentiment.polarity) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
print("3 Random Reviews with Highest Polarity:") | |
for index,review in enumerate(df.iloc[df['polarity'].sort_values(ascending=False)[:3].index]['reviews.text']): | |
print('Review {}:\n'.format(index+1),review) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
print("3 Random Reviews with Lowest Polarity:") | |
for index,review in enumerate(df.iloc[df['polarity'].sort_values(ascending=True)[:3].index]['reviews.text']): | |
print('Review {}:\n'.format(index+1),review) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import textstat | |
df['dale_chall_score']=df['reviews.text'].apply(lambda x: textstat.dale_chall_readability_score(x)) | |
df['flesh_reading_ease']=df['reviews.text'].apply(lambda x: textstat.flesch_reading_ease(x)) | |
df['gunning_fog']=df['reviews.text'].apply(lambda x: textstat.gunning_fog(x)) | |
print('Dale Chall Score of upvoted reviews=>',df[df['reviews.numHelpful']>1]['dale_chall_score'].mean()) | |
print('Dale Chall Score of not upvoted reviews=>',df[df['reviews.numHelpful']<=1]['dale_chall_score'].mean()) | |
print('Flesch Reading Score of upvoted reviews=>',df[df['reviews.numHelpful']>1]['flesh_reading_ease'].mean()) | |
print('Flesch Reading Score of not upvoted reviews=>',df[df['reviews.numHelpful']<=1]['flesh_reading_ease'].mean()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df['text_standard']=df['reviews.text'].apply(lambda x: textstat.text_standard(x)) | |
print('Text Standard of upvoted reviews=>',df[df['reviews.numHelpful']>1]['text_standard'].mode()) | |
print('Text Standard of not upvoted reviews=>',df[df['reviews.numHelpful']<=1]['text_standard'].mode()) |