Skip to content

Instantly share code, notes, and snippets.

@CliffordAnderson
Created February 14, 2020 18:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CliffordAnderson/d4a6a4dc310a5ebae2d24174335858d9 to your computer and use it in GitHub Desktop.
Save CliffordAnderson/d4a6a4dc310a5ebae2d24174335858d9 to your computer and use it in GitHub Desktop.
Spark Notes
val pbp = spark.read.format("csv").load("Desktop/pbp.csv")
pbp.show
pbp.printSchema
val bp = pbp.withColumnRenamed("_c0", "article").withColumnRenamed("_c1", "journal").withColumnRenamed("_c2", "volume").withColumnRenamed("_c3", "issue").withColumnRenamed("_c4", "date").withColumnRenamed("_c5", "pages").withColumnRenamed("_c6", "url").withColumnRenamed("_c7", "text")
bp.printSchema
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val explainDocumentPipeline = PretrainedPipeline("explain_document_ml")
val bp_annotated = explainDocumentPipeline.transform(bp)
bp_annotated.select("token").show()
val bp_select = bp.select("text")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment