Skip to content

Instantly share code, notes, and snippets.

@localshred
Created December 21, 2011 23:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save localshred/1508219 to your computer and use it in GitHub Desktop.
Save localshred/1508219 to your computer and use it in GitHub Desktop.
Evented Sentence Parser
$ ruby sentence_parsing.rb
Found line with 264 characters
Line = Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus dapibus elit et ligula vestibulum porttitor. Vestibulum tristique suscipit sem eu cursus. Aenean sit amet ligula elit. Morbi venenatis scelerisque viverra. Cras at nisl quis libero rutrum accumsan.
Found word = Lorem
Found word = ipsum
Found word = dolor
Found word = sit
Found word = amet,
Found word = consectetur
Found word = adipiscing
Found word = elit.
Found word = Phasellus
Found word = dapibus
Found word = elit
Found word = et
Found word = ligula
Found word = vestibulum
Found word = porttitor.
Found word = Vestibulum
Found word = tristique
Found word = suscipit
Found word = sem
Found word = eu
Found word = cursus.
Found word = Aenean
Found word = sit
Found word = amet
Found word = ligula
Found word = elit.
Found word = Morbi
Found word = venenatis
Found word = scelerisque
Found word = viverra.
Found word = Cras
Found word = at
Found word = nisl
Found word = quis
Found word = libero
Found word = rutrum
Found word = accumsan.
Found line with 0 characters
Line =
Found line with 293 characters
Line = Aenean et nisl felis, nec convallis erat. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus nec purus nunc, sit amet ornare purus. Vestibulum laoreet mattis sem non malesuada. Nunc vitae lectus neque. Duis sit amet velit non nulla facilisis sodales.
Found word = Aenean
Found word = et
Found word = nisl
Found word = felis,
Found word = nec
Found word = convallis
Found word = erat.
Found word = Cum
Found word = sociis
Found word = natoque
Found word = penatibus
Found word = et
Found word = magnis
Found word = dis
Found word = parturient
Found word = montes,
Found word = nascetur
Found word = ridiculus
Found word = mus.
Found word = Vivamus
Found word = nec
Found word = purus
Found word = nunc,
Found word = sit
Found word = amet
Found word = ornare
Found word = purus.
Found word = Vestibulum
Found word = laoreet
Found word = mattis
Found word = sem
Found word = non
Found word = malesuada.
Found word = Nunc
Found word = vitae
Found word = lectus
Found word = neque.
Found word = Duis
Found word = sit
Found word = amet
Found word = velit
Found word = non
Found word = nulla
Found word = facilisis
Found word = sodales.
Found line with 0 characters
Line =
Found line with 330 characters
Line = Aenean ultrices sapien ac enim lacinia euismod eleifend pulvinar urna. Nulla leo metus, viverra non lacinia at, posuere at leo. Nullam dictum venenatis tristique. Fusce pellentesque felis vitae libero gravida at interdum est lacinia. Nam rhoncus, diam at gravida dictum, odio velit rutrum erat, vitae laoreet nisl tortor at magna.
Found word = Aenean
Found word = ultrices
Found word = sapien
Found word = ac
Found word = enim
Found word = lacinia
Found word = euismod
Found word = eleifend
Found word = pulvinar
Found word = urna.
Found word = Nulla
Found word = leo
Found word = metus,
Found word = viverra
Found word = non
Found word = lacinia
Found word = at,
Found word = posuere
Found word = at
Found word = leo.
Found word = Nullam
Found word = dictum
Found word = venenatis
Found word = tristique.
Found word = Fusce
Found word = pellentesque
Found word = felis
Found word = vitae
Found word = libero
Found word = gravida
Found word = at
Found word = interdum
Found word = est
Found word = lacinia.
Found word = Nam
Found word = rhoncus,
Found word = diam
Found word = at
Found word = gravida
Found word = dictum,
Found word = odio
Found word = velit
Found word = rutrum
Found word = erat,
Found word = vitae
Found word = laoreet
Found word = nisl
Found word = tortor
Found word = at
Found word = magna.
require 'eventually'
class SentenceParser
include Eventually
def initialize(document)
@document = document
end
def parse!
lines = @document.split(/\r?\n/)
lines.each do |line|
emit(:line, line)
words = line.split(/\s+/)
words.each do |word|
emit(:word, word)
end
end
self
end
end
document = %Q{Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus dapibus elit et ligula vestibulum porttitor. Vestibulum tristique suscipit sem eu cursus. Aenean sit amet ligula elit. Morbi venenatis scelerisque viverra. Cras at nisl quis libero rutrum accumsan.
Aenean et nisl felis, nec convallis erat. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus nec purus nunc, sit amet ornare purus. Vestibulum laoreet mattis sem non malesuada. Nunc vitae lectus neque. Duis sit amet velit non nulla facilisis sodales.
Aenean ultrices sapien ac enim lacinia euismod eleifend pulvinar urna. Nulla leo metus, viverra non lacinia at, posuere at leo. Nullam dictum venenatis tristique. Fusce pellentesque felis vitae libero gravida at interdum est lacinia. Nam rhoncus, diam at gravida dictum, odio velit rutrum erat, vitae laoreet nisl tortor at magna.}
parser = SentenceParser.new(document)
parser.on(:line) do |line|
puts 'Found line with %d characters' % line.length
puts 'Line = %s' % line
end
parser.on(:word) do |word|
puts 'Found word = %s' % word
end
parser.parse!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment