Skip to content

Instantly share code, notes, and snippets.

Created August 6, 2017 14:25
Show Gist options
  • Save anonymous/793ec77d581c80825a3823968fae40a4 to your computer and use it in GitHub Desktop.
Save anonymous/793ec77d581c80825a3823968fae40a4 to your computer and use it in GitHub Desktop.
case class Features(id: BigInt, qid1: BigInt, qid2: BigInt, question1: String, question2: String, isDuplicate: Boolean)
def loadTrainFile(spark: SparkSession, trainFile: String): Dataset[Features] = {
import spark.implicits.newProductEncoder
val df = loadTrainDataFrame(spark, trainFile)
df.withColumnRenamed("is_duplicate", "isDuplicate").as[Features]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment