Skip to content

Instantly share code, notes, and snippets.

@jpowell
Created February 9, 2012 16:58
Show Gist options
  • Save jpowell/1781090 to your computer and use it in GitHub Desktop.
Save jpowell/1781090 to your computer and use it in GitHub Desktop.
simple tokenizers for ruby
class EmailTokenizer < StringTokenizer
def initialize text
super text, /(\s|,|;)+/
end
def self.filter array=[]
result = []
array.each do |item|
result << item.downcase if item =~ /.+@.+/
end
result
end
protected
def tokenize
self.class.filter super
end
end
class StringTokenizer
def initialize text='', delim=/\s+/
raise ArgumentError, 'Text must be a string' if text.nil?
@text, @delimiter = text, delim
end
def tokens
@tokens ||= tokenize
end
protected
def tokenize
@text.split @delimiter
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment