Skip to content

Instantly share code, notes, and snippets.

View Nithilaa's full-sized avatar

Nithilaa Umasankar Nithilaa

  • Coimbatore
View GitHub Profile
@Nithilaa
Nithilaa / lower_case.py
Created June 10, 2021 03:31
lower the case
%time df2['Review_Processed'] = df2['Review_Processed'].map(lambda x: x.lower())
@Nithilaa
Nithilaa / remove_punc_unicode.py
Last active June 10, 2021 03:32
Remove all punctuations and Unicode
%time df2['Review_Processed'] = df2['Review_Processed'].map(lambda x : re.sub(r'[^\x00-\x7F]+',' ', x))
%time df2['Review_Processed'] = df2['Review_Processed'].map(lambda x: re.sub(r'[^\w\s]', '', x))
@Nithilaa
Nithilaa / remove_stopwords.py
Created June 10, 2021 03:33
remove stopwords
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
%time df2['Review_Processed'] = df2['Review_Processed'].map(lambda x : ' '.join([w for w in x.split() if w not in stop_words]))
@Nithilaa
Nithilaa / lemmatize.py
Created June 10, 2021 03:34
lemmatization
lemmer = WordNetLemmatizer()
%time df2['Review_Processed'] = df2['Review_Processed'].map(lambda x : ' '.join([lemmer.lemmatize(w) for w in x.split() if w not in stop_words]))
@Nithilaa
Nithilaa / countvec_import.py
Created June 10, 2021 03:37
import count vectorizer
from sklearn.feature_extraction.text import CountVectorizer
@Nithilaa
Nithilaa / countvec_obj.py
Created June 10, 2021 03:38
initialize count vec object
tf_vectorizer = CountVectorizer(min_df=.015, max_df=.8, max_features=no_features, ngram_range=[1, 3])
@Nithilaa
Nithilaa / fit_transform.py
Last active July 26, 2021 07:38
fit transform
%time features = tf_vectorizer.fit_transform(df['user_review'])
@Nithilaa
Nithilaa / return_matrix.py
Last active July 26, 2021 07:40
return a data frame wherein I’ll have the count of that particular vectorizer that’s applied to a given word and the columns would be my overall feature names that are the individual words.
features_df = pd.DataFrame(features.toarray(), columns=tf_vectorizer.get_feature_names())
df = pd.concat([features_df,df['user_suggestion']],axis=1)
@Nithilaa
Nithilaa / drop_num_cols.py
Created June 10, 2021 03:44
drop numeric columns
df_tf_m_columns = df_tf_m.columns
df_tf_m_columns
res = [sub for sub in df_tf_m_columns if sub.isalpha()]
res.append('Flag_1')
df_tf_m = df_tf_m.drop(columns=[col for col in df_tf_m if col not in res])
df_tf_m.head()