Skip to content

Instantly share code, notes, and snippets.

@cxtadment
Created April 18, 2016 00:08
Show Gist options
  • Save cxtadment/7815ab31f0ed129e55f725ddc0bff867 to your computer and use it in GitHub Desktop.
Save cxtadment/7815ab31f0ed129e55f725ddc0bff867 to your computer and use it in GitHub Desktop.
def seg_filter(self, word, tagging):
# filter stop words including punctuation
if word in self.stopwords:
return False
# filter element containing number
if re.match('^(?=.*\\d)', word):
return False
# if the word is in the topics
if word in self.topics:
return False
return True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment