Skip to content

Instantly share code, notes, and snippets.

@srang992
Created April 16, 2022 09:10
Show Gist options
  • Save srang992/b7340f623af22b7c2f8a1e00d329dcd8 to your computer and use it in GitHub Desktop.
Save srang992/b7340f623af22b7c2f8a1e00d329dcd8 to your computer and use it in GitHub Desktop.
function for cleaning data
def clean_desc(s):
s = str(s)
s = s.lower()
s = re.sub(r'[^a-zA-Z]', ' ', s)
return s
# make a copy of the main data and do the preprocessing steps on that data
netflix_data_copy['clean_desc'] = netflix_data_copy['description'].apply(cleaning)
#tokenizing the words for lemmatization and removing stopwords
netflix_data_copy['clean_desc'] = netflix_data_copy['clean_desc'].apply(word_tokenize)
netflix_data_copy['clean_desc'] = netflix_data_copy['clean_desc'].apply(
lambda x:[word for word in x if word not in set(stopwords.words('english'))]
)
# joining the words after lemmatization and stopword removal
netflix_data_copy['clean_desc'] = netflix_data_copy['clean_desc'].apply(lambda x: ' '.join(x))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment