Created
December 21, 2011 23:35
-
-
Save localshred/1508219 to your computer and use it in GitHub Desktop.
Evented Sentence Parser
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ruby sentence_parsing.rb | |
Found line with 264 characters | |
Line = Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus dapibus elit et ligula vestibulum porttitor. Vestibulum tristique suscipit sem eu cursus. Aenean sit amet ligula elit. Morbi venenatis scelerisque viverra. Cras at nisl quis libero rutrum accumsan. | |
Found word = Lorem | |
Found word = ipsum | |
Found word = dolor | |
Found word = sit | |
Found word = amet, | |
Found word = consectetur | |
Found word = adipiscing | |
Found word = elit. | |
Found word = Phasellus | |
Found word = dapibus | |
Found word = elit | |
Found word = et | |
Found word = ligula | |
Found word = vestibulum | |
Found word = porttitor. | |
Found word = Vestibulum | |
Found word = tristique | |
Found word = suscipit | |
Found word = sem | |
Found word = eu | |
Found word = cursus. | |
Found word = Aenean | |
Found word = sit | |
Found word = amet | |
Found word = ligula | |
Found word = elit. | |
Found word = Morbi | |
Found word = venenatis | |
Found word = scelerisque | |
Found word = viverra. | |
Found word = Cras | |
Found word = at | |
Found word = nisl | |
Found word = quis | |
Found word = libero | |
Found word = rutrum | |
Found word = accumsan. | |
Found line with 0 characters | |
Line = | |
Found line with 293 characters | |
Line = Aenean et nisl felis, nec convallis erat. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus nec purus nunc, sit amet ornare purus. Vestibulum laoreet mattis sem non malesuada. Nunc vitae lectus neque. Duis sit amet velit non nulla facilisis sodales. | |
Found word = Aenean | |
Found word = et | |
Found word = nisl | |
Found word = felis, | |
Found word = nec | |
Found word = convallis | |
Found word = erat. | |
Found word = Cum | |
Found word = sociis | |
Found word = natoque | |
Found word = penatibus | |
Found word = et | |
Found word = magnis | |
Found word = dis | |
Found word = parturient | |
Found word = montes, | |
Found word = nascetur | |
Found word = ridiculus | |
Found word = mus. | |
Found word = Vivamus | |
Found word = nec | |
Found word = purus | |
Found word = nunc, | |
Found word = sit | |
Found word = amet | |
Found word = ornare | |
Found word = purus. | |
Found word = Vestibulum | |
Found word = laoreet | |
Found word = mattis | |
Found word = sem | |
Found word = non | |
Found word = malesuada. | |
Found word = Nunc | |
Found word = vitae | |
Found word = lectus | |
Found word = neque. | |
Found word = Duis | |
Found word = sit | |
Found word = amet | |
Found word = velit | |
Found word = non | |
Found word = nulla | |
Found word = facilisis | |
Found word = sodales. | |
Found line with 0 characters | |
Line = | |
Found line with 330 characters | |
Line = Aenean ultrices sapien ac enim lacinia euismod eleifend pulvinar urna. Nulla leo metus, viverra non lacinia at, posuere at leo. Nullam dictum venenatis tristique. Fusce pellentesque felis vitae libero gravida at interdum est lacinia. Nam rhoncus, diam at gravida dictum, odio velit rutrum erat, vitae laoreet nisl tortor at magna. | |
Found word = Aenean | |
Found word = ultrices | |
Found word = sapien | |
Found word = ac | |
Found word = enim | |
Found word = lacinia | |
Found word = euismod | |
Found word = eleifend | |
Found word = pulvinar | |
Found word = urna. | |
Found word = Nulla | |
Found word = leo | |
Found word = metus, | |
Found word = viverra | |
Found word = non | |
Found word = lacinia | |
Found word = at, | |
Found word = posuere | |
Found word = at | |
Found word = leo. | |
Found word = Nullam | |
Found word = dictum | |
Found word = venenatis | |
Found word = tristique. | |
Found word = Fusce | |
Found word = pellentesque | |
Found word = felis | |
Found word = vitae | |
Found word = libero | |
Found word = gravida | |
Found word = at | |
Found word = interdum | |
Found word = est | |
Found word = lacinia. | |
Found word = Nam | |
Found word = rhoncus, | |
Found word = diam | |
Found word = at | |
Found word = gravida | |
Found word = dictum, | |
Found word = odio | |
Found word = velit | |
Found word = rutrum | |
Found word = erat, | |
Found word = vitae | |
Found word = laoreet | |
Found word = nisl | |
Found word = tortor | |
Found word = at | |
Found word = magna. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'eventually' | |
class SentenceParser | |
include Eventually | |
def initialize(document) | |
@document = document | |
end | |
def parse! | |
lines = @document.split(/\r?\n/) | |
lines.each do |line| | |
emit(:line, line) | |
words = line.split(/\s+/) | |
words.each do |word| | |
emit(:word, word) | |
end | |
end | |
self | |
end | |
end | |
document = %Q{Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus dapibus elit et ligula vestibulum porttitor. Vestibulum tristique suscipit sem eu cursus. Aenean sit amet ligula elit. Morbi venenatis scelerisque viverra. Cras at nisl quis libero rutrum accumsan. | |
Aenean et nisl felis, nec convallis erat. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus nec purus nunc, sit amet ornare purus. Vestibulum laoreet mattis sem non malesuada. Nunc vitae lectus neque. Duis sit amet velit non nulla facilisis sodales. | |
Aenean ultrices sapien ac enim lacinia euismod eleifend pulvinar urna. Nulla leo metus, viverra non lacinia at, posuere at leo. Nullam dictum venenatis tristique. Fusce pellentesque felis vitae libero gravida at interdum est lacinia. Nam rhoncus, diam at gravida dictum, odio velit rutrum erat, vitae laoreet nisl tortor at magna.} | |
parser = SentenceParser.new(document) | |
parser.on(:line) do |line| | |
puts 'Found line with %d characters' % line.length | |
puts 'Line = %s' % line | |
end | |
parser.on(:word) do |word| | |
puts 'Found word = %s' % word | |
end | |
parser.parse! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment