public
Created

Evented Sentence Parser

  • Download Gist
output.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
$ ruby sentence_parsing.rb
Found line with 264 characters
Line = Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus dapibus elit et ligula vestibulum porttitor. Vestibulum tristique suscipit sem eu cursus. Aenean sit amet ligula elit. Morbi venenatis scelerisque viverra. Cras at nisl quis libero rutrum accumsan.
Found word = Lorem
Found word = ipsum
Found word = dolor
Found word = sit
Found word = amet,
Found word = consectetur
Found word = adipiscing
Found word = elit.
Found word = Phasellus
Found word = dapibus
Found word = elit
Found word = et
Found word = ligula
Found word = vestibulum
Found word = porttitor.
Found word = Vestibulum
Found word = tristique
Found word = suscipit
Found word = sem
Found word = eu
Found word = cursus.
Found word = Aenean
Found word = sit
Found word = amet
Found word = ligula
Found word = elit.
Found word = Morbi
Found word = venenatis
Found word = scelerisque
Found word = viverra.
Found word = Cras
Found word = at
Found word = nisl
Found word = quis
Found word = libero
Found word = rutrum
Found word = accumsan.
Found line with 0 characters
Line =
Found line with 293 characters
Line = Aenean et nisl felis, nec convallis erat. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus nec purus nunc, sit amet ornare purus. Vestibulum laoreet mattis sem non malesuada. Nunc vitae lectus neque. Duis sit amet velit non nulla facilisis sodales.
Found word = Aenean
Found word = et
Found word = nisl
Found word = felis,
Found word = nec
Found word = convallis
Found word = erat.
Found word = Cum
Found word = sociis
Found word = natoque
Found word = penatibus
Found word = et
Found word = magnis
Found word = dis
Found word = parturient
Found word = montes,
Found word = nascetur
Found word = ridiculus
Found word = mus.
Found word = Vivamus
Found word = nec
Found word = purus
Found word = nunc,
Found word = sit
Found word = amet
Found word = ornare
Found word = purus.
Found word = Vestibulum
Found word = laoreet
Found word = mattis
Found word = sem
Found word = non
Found word = malesuada.
Found word = Nunc
Found word = vitae
Found word = lectus
Found word = neque.
Found word = Duis
Found word = sit
Found word = amet
Found word = velit
Found word = non
Found word = nulla
Found word = facilisis
Found word = sodales.
Found line with 0 characters
Line =
Found line with 330 characters
Line = Aenean ultrices sapien ac enim lacinia euismod eleifend pulvinar urna. Nulla leo metus, viverra non lacinia at, posuere at leo. Nullam dictum venenatis tristique. Fusce pellentesque felis vitae libero gravida at interdum est lacinia. Nam rhoncus, diam at gravida dictum, odio velit rutrum erat, vitae laoreet nisl tortor at magna.
Found word = Aenean
Found word = ultrices
Found word = sapien
Found word = ac
Found word = enim
Found word = lacinia
Found word = euismod
Found word = eleifend
Found word = pulvinar
Found word = urna.
Found word = Nulla
Found word = leo
Found word = metus,
Found word = viverra
Found word = non
Found word = lacinia
Found word = at,
Found word = posuere
Found word = at
Found word = leo.
Found word = Nullam
Found word = dictum
Found word = venenatis
Found word = tristique.
Found word = Fusce
Found word = pellentesque
Found word = felis
Found word = vitae
Found word = libero
Found word = gravida
Found word = at
Found word = interdum
Found word = est
Found word = lacinia.
Found word = Nam
Found word = rhoncus,
Found word = diam
Found word = at
Found word = gravida
Found word = dictum,
Found word = odio
Found word = velit
Found word = rutrum
Found word = erat,
Found word = vitae
Found word = laoreet
Found word = nisl
Found word = tortor
Found word = at
Found word = magna.
sentence_parsing.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
require 'eventually'
 
class SentenceParser
include Eventually
def initialize(document)
@document = document
end
def parse!
lines = @document.split(/\r?\n/)
lines.each do |line|
emit(:line, line)
words = line.split(/\s+/)
words.each do |word|
emit(:word, word)
end
end
self
end
end
 
document = %Q{Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus dapibus elit et ligula vestibulum porttitor. Vestibulum tristique suscipit sem eu cursus. Aenean sit amet ligula elit. Morbi venenatis scelerisque viverra. Cras at nisl quis libero rutrum accumsan.
 
Aenean et nisl felis, nec convallis erat. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Vivamus nec purus nunc, sit amet ornare purus. Vestibulum laoreet mattis sem non malesuada. Nunc vitae lectus neque. Duis sit amet velit non nulla facilisis sodales.
 
Aenean ultrices sapien ac enim lacinia euismod eleifend pulvinar urna. Nulla leo metus, viverra non lacinia at, posuere at leo. Nullam dictum venenatis tristique. Fusce pellentesque felis vitae libero gravida at interdum est lacinia. Nam rhoncus, diam at gravida dictum, odio velit rutrum erat, vitae laoreet nisl tortor at magna.}
 
parser = SentenceParser.new(document)
parser.on(:line) do |line|
puts 'Found line with %d characters' % line.length
puts 'Line = %s' % line
end
parser.on(:word) do |word|
puts 'Found word = %s' % word
end
parser.parse!

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.