Skip to content

Instantly share code, notes, and snippets.

@colbyford
Last active September 23, 2022 16:41
Show Gist options
  • Save colbyford/975ea1b05bef80b4c12292a139dcdbd7 to your computer and use it in GitHub Desktop.
Save colbyford/975ea1b05bef80b4c12292a139dcdbd7 to your computer and use it in GitHub Desktop.
Save trained SparkML model to storage. Load model then transform new dataset.
########################################
## Title: Spark MLlib Model Saver
## Language: PySpark
## Author: Colby T. Ford, Ph.D.
########################################
## Write Model to Blob
lrcvModel.save("/mnt/trainedmodels/lr")
rfcvModel.save("/mnt/trainedmodels/rf")
dtcvModel.save("/mnt/trainedmodels/dt")
display(dbutils.fs.ls("/mnt/trainedmodels/"))
## Remove an Old Model Directory
dbutils.fs.rm("/mnt/trainedmodels/dt", True)
## Load Trained Model and Transform Dataset
# Score the data using the model
from pyspark.ml.tuning import CrossValidatorModel
lrcvModel = CrossValidatorModel.load("/mnt/trainedmodels/lr/")
output = lrcvModel.bestModel.transform(dataset)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment