Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Created November 4, 2019 07:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lakshay-arora/63042efc90c9a6d94b0d1551ce0b292b to your computer and use it in GitHub Desktop.
Save lakshay-arora/63042efc90c9a6d94b0d1551ce0b292b to your computer and use it in GitHub Desktop.
from pyspark.ml import Pipeline
# create a sample dataframe
sample_df = spark.createDataFrame([
(1, 'L101', 'R'),
(2, 'L201', 'C'),
(3, 'D111', 'R'),
(4, 'F210', 'R'),
(5, 'D110', 'C')
], ['id', 'category_1', 'category_2'])
sample_df.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment