Skip to content

Instantly share code, notes, and snippets.

@jbowles
Created February 22, 2013 04:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jbowles/5010687 to your computer and use it in GitHub Desktop.
Save jbowles/5010687 to your computer and use it in GitHub Desktop.
A quick look at what can be done with treat
require 'treat'
include Treat::Core::DSL
doc1 = document('http://en.wikipedia.org/wiki/List_of_best-selling_fiction_authors')
doc2 = document('http://en.wikipedia.org/wiki/List_of_best-selling_books')
[d1,d2].apply(:chunk, :segment, :tokenize)
#Check it!
doc1.sentences
doc1.sentences.count
doc1.sentences.first
doc1.words
doc1.tokens
doc1.phrases
doc1.phrases.first
# You can define your own phrases and sentences
phrase_1 = phrase('this is a phrase')
phrase_2 = phrase('this is another phrase')
# A deeper dive into complicated objects
s = sentence('This is a sentence, whith phrases in it!')
s.to_s
# Print tree as we go through the decomposition/construction
# Basically, only some constructs will be available or evalutated on the object
# at certain points of the decompisition of the sentence and construction of
# the ruby object
s.print_tree
# Tokenize before you apply :parse and :category
s.apply :tokenize
s.tokens
s.print_tree # should be same as tokens, words
s.tokens.each{|t| p t} # should be same as tree, words
s.words.each{|w| p t} # should be same as tree, tokens
s.apply :parse # Call out to JVM
s.print_tree # should look different now
s.phrases.each{|phrase| p phrase}
s.tokens.each{|phrase| p phrase[:tag]}
s.apply :category
s.print_tree
s.verb_count
s.noun_count
# NP = Noun Phrase, gets you all noun phrases
s.each_phrase_with_tag('NP') do |np_phrase|
puts np_phrase.to_s
end
# VP = Verb Phrase, gets you all verb phrases
s.each_phrase_with_tag('VP') do |vp_phrase|
puts vp_phrase.to_s
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment