Skip to content

Instantly share code, notes, and snippets.

@Ben-Epstein
Created November 5, 2020 15:30
Show Gist options
  • Save Ben-Epstein/917a050f976c4670c475ab63349e121d to your computer and use it in GitHub Desktop.
Save Ben-Epstein/917a050f976c4670c475ab63349e121d to your computer and use it in GitHub Desktop.
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml import Pipeline
va = VectorAssembler(inputCols=cols[:-1], outputCol='features') # Define feature vector
dt = DecisionTreeClassifier() # Define model
pipeline = Pipeline(stages=[va, dt]) # Chain steps together into a full pipeline
train, test = df.randomSplit([0.8,0.2])
model = pipeline.fit(train)
print(model)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment