Skip to content

Instantly share code, notes, and snippets.

@huangziwei
Created March 9, 2016 05:47
Show Gist options
  • Save huangziwei/8ebac04139d4ee456268 to your computer and use it in GitHub Desktop.
Save huangziwei/8ebac04139d4ee456268 to your computer and use it in GitHub Desktop.
clean_text
import re
def clean_txt(raw):
raw = re.sub('[A-Za-z]+', '', raw) # 去英文
raw = re.sub('\d+(\.)?\d?', '', raw) # 去数字
raw = re.sub('\W+', '', raw) # 去标点
return raw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment