Skip to content

Instantly share code, notes, and snippets.

View bobbruno's full-sized avatar

Roberto Bruno Martins bobbruno

View GitHub Profile
@bobbruno
bobbruno / CoreNLPLoadModel.scala
Created January 12, 2018 10:30
Load Stanford CoreNLP in Databricks Spark
val version = "3.7.0" // CoreNLP version the model will be used with
val model = s"stanford-corenlp-$version-models" // append "-english" to use the full English model
if (!sc.listJars.exists(jar => jar.contains(model))) {
import scala.sys.process._
s"wget http://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/$version/$model.jar -O /tmp/$model.jar".!!
sc.addJar(s"/tmp/$model.jar")
}
@bobbruno
bobbruno / NLPPipeline.scala
Created January 12, 2018 10:25
Stanford CoreNLP in Spark Scala
/**
This class allows usage of CoreNLP in Spark, creating an instance of the pipeline on each worker so that the
code can run in parallel.
@param annotators: the CoreNLP annotator pipeline
@param params: the parameters desired for the annotators
*/
class NLPPipeline(annotators: String, params: Tuple2[String, String]*) extends Serializable {
import edu.stanford.nlp.pipeline._
import java.util.Properties
@bobbruno
bobbruno / bufferise.py
Created November 10, 2015 00:16
python decorator for bufferising generator output
def bufferise(defbuf=20, defskip=0):
def decorate(function):
def wrapper(*args, **kwargs):
bufsize = kwargs['bufsize'] if 'bufsize' in kwargs else defbuf
skiplines = kwargs['skiplines'] if 'skiplines' in kwargs else defskip
print 'Bufsize = {}'.format(bufsize)
print 'Skip {} lines'.format(skiplines)
if skiplines:
for i, record in enumerate(function(*args, **kwargs), start=1):
if i > skiplines: