Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Created November 4, 2019 07:08
from pyspark.ml import Pipeline
# create a sample dataframe
sample_df = spark.createDataFrame([
(1, 'L101', 'R'),
(2, 'L201', 'C'),
(3, 'D111', 'R'),
(4, 'F210', 'R'),
(5, 'D110', 'C')
], ['id', 'category_1', 'category_2'])
sample_df.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment