Skip to content

Instantly share code, notes, and snippets.

@veer66
Created July 23, 2014 12:38
Show Gist options
  • Save veer66/9e6908175e6fb52fdd09 to your computer and use it in GitHub Desktop.
Save veer66/9e6908175e6fb52fdd09 to your computer and use it in GitHub Desktop.
Partial Apertium's stream parser (for parsing biltrans result)
module Apertium
OUTSIDE_WORD = 0
INSIDE_WORD = 1
class B
attr_reader :text
def initialize(text)
@text = text
end
def parse
self
end
end
class Analyse
attr_reader :lemma, :tags
def initialize(lemma, tags)
@lemma = lemma
@tags = tags
end
end
class AmbiLu
attr_reader :analyses
def initialize(analyses)
@analyses = analyses
end
end
class W0
attr_reader :text
def initialize(text)
@text = text
end
def parse_analisis(t)
t = t.split(/(<[^>]+>)/).select{|w| w != ""}
lemma = t[0]
tags = t[1..-1].map{|tag| tag[1..-2]}
return Analyse.new(lemma, tags)
end
def parse
s = 0
analyses = []
text = @text[1..-2]
for i in 0..(text.length-1)
if text[i] == '/' and (i == 0 or text[i] != '\\')
analyses << parse_analisis(text[s..(i-1)])
s = i+1
end
end
if s < text.length
analyses << parse_analisis(text[s..(text.length-1)])
end
return AmbiLu.new(analyses)
end
end
class StreamParser
def initialize
end
def parse(stream)
pass0 = parse0(stream)
return pass0.map{|t| t.parse}
end
def parse0(stream)
state = OUTSIDE_WORD
s = 0
pass0 = []
for i in 0 .. (stream.length - 1)
case state
when OUTSIDE_WORD
if stream[i] == '^' and (i == 0 or stream[i] != '\\')
if i > 0
if i-1 >= s
pass0 << B.new(stream[s..(i-1)])
end
end
s = i
state = INSIDE_WORD
end
when INSIDE_WORD
if stream[i] == '$' and (i == 0 or stream[i] != '\\')
state = OUTSIDE_WORD
pass0 << W0.new(stream[s..(i-1)])
s = i + 1
end
end
end #for
return pass0
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment