Skip to content

Instantly share code, notes, and snippets.

@squarism
Created December 19, 2011 19:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save squarism/1498515 to your computer and use it in GitHub Desktop.
Save squarism/1498515 to your computer and use it in GitHub Desktop.
nlp_playtime_1
# need jruby
# need stanford-corenlp in a directory somewhere
# untar it, cd into it
# irb innvocation for more memory happy:
# jruby -J-Xmn512m -J-Xms2048m -J-Xmx2048m -J-server -S irb
require 'java'
require 'stanford-corenlp-2011-09-16.jar'
require 'stanford-corenlp-2011-09-14-models.jar'
require 'xom.jar'
require 'joda-time.jar'
java_import "edu.stanford.nlp.pipeline.StanfordCoreNLP"
nlp = StanfordCoreNLP.new
# => NativeException: java.lang.NoClassDefFoundError: Could not initialize class
# edu.stanford.nlp.time.SUTime
java_import "edu.stanford.nlp.process.DocumentPreprocessor"
dp = DocumentPreprocessor.new("input.txt")
dp.entries.to_s
# "[Stanford, University, is, located, in, California, .][It, is, a, great, university, .]"
words = dp.entries.collect {|e| e.collect {|word| word.to_s} }
words.index {|sentence| sentence.include? "great"}
# => 1
words.index {|sentence| sentence.include? "in"}
# => 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment