Skip to content

Instantly share code, notes, and snippets.

@pedroduartecosta
Last active January 29, 2019 15:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pedroduartecosta/1d8eadf26c6cf7be1440ce18e28414a0 to your computer and use it in GitHub Desktop.
Save pedroduartecosta/1d8eadf26c6cf7be1440ce18e28414a0 to your computer and use it in GitHub Desktop.
val lr = new LinearRegression()
.setLabelCol("DelayOutputVar")
.setFeaturesCol("features")
val paramGrid = new ParamGridBuilder()
.addGrid(lr.regParam, Array(0.1, 0.01))
.addGrid(lr.fitIntercept)
.addGrid(lr.elasticNetParam, Array(0.0, 1.0))
.build()
val steps:Array[org.apache.spark.ml.PipelineStage] = if(useCategorical){
categoricalIndexers ++ categoricalEncoders ++ Array(assembler, lr)
}else{
Array(assembler, lr)
}
val pipeline = new Pipeline().setStages(steps)
val tvs = new TrainValidationSplit()
.setEstimator(pipeline)
.setEvaluator(new RegressionEvaluator().setLabelCol("DelayOutputVar"))
.setEstimatorParamMaps(paramGrid)
.setTrainRatio(0.7)
val Array(training, test) = data.randomSplit(Array(0.70, 0.30), seed = 12345)
val model = tvs.fit(training)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment