Skip to content

Instantly share code, notes, and snippets.

@aravindpai
Last active November 11, 2020 07:05
Show Gist options
  • Save aravindpai/143a1efae33ea6b1ea96473b1457a819 to your computer and use it in GitHub Desktop.
Save aravindpai/143a1efae33ea6b1ea96473b1457a819 to your computer and use it in GitHub Desktop.
def summary_cleaner(text):
   newString = re.sub('"','', text)
   newString = ' '.join([contraction_mapping[t] if t in contraction_mapping else t for t in newString.split(" ")])    
   newString = re.sub(r"'s\b","",newString)
   newString = re.sub("[^a-zA-Z]", " ", newString)
   newString = newString.lower()
   tokens=newString.split()
   newString=''
   for i in tokens:
       if len(i)>1:                                 
           newString=newString+i+' '  
   return newString
#Call the above function
cleaned_summary = []
for t in data['Summary']:
   cleaned_summary.append(summary_cleaner(t))
data['cleaned_text']=cleaned_text
data['cleaned_summary']=cleaned_summary
data['cleaned_summary'].replace('', np.nan, inplace=True)
data.dropna(axis=0,inplace=True)
@saosophea9988
Copy link

saosophea9988 commented Nov 11, 2020

Where is data['Summary'] from?
in " for t in data['Summary']:
cleaned_summary.append(summary_cleaner(t)) "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment