Skip to content

Instantly share code, notes, and snippets.

@matpalm
Created August 20, 2009 12:02
Show Gist options
  • Save matpalm/171020 to your computer and use it in GitHub Desktop.
Save matpalm/171020 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
def emit tuple
puts "LongValueSum:#{tuple.join(' ')}\t1"
end
NGRAM_SIZE = 3
STDIN.each do |line|
terms = line.downcase.gsub(/\'/,'').gsub(/[^a-z0-9]/,' ').chomp.strip.split
next if terms.size < NGRAM_SIZE
tuple = []
NGRAM_SIZE.times { tuple << terms.shift }
emit tuple
while not terms.empty?
tuple.shift
tuple << terms.shift
emit tuple
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment