Skip to content

Instantly share code, notes, and snippets.

@sujee
Last active February 1, 2021 19:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sujee/1c213dfe448f6f09ca29ec8f469c0271 to your computer and use it in GitHub Desktop.
Save sujee/1c213dfe448f6f09ca29ec8f469c0271 to your computer and use it in GitHub Desktop.
Spark one hot encoding sample
## Step 3 : encode the indexes into a vector
from pyspark.ml.feature import OneHotEncoder
encoder = OneHotEncoder(inputCols=["statusIndex"], outputCols=["statusVector"], dropLast=False)
encoded = encoder.fit(indexed).transform(indexed)
encoded.show()
# View dense vectors in pandas
encoded_pd = encoded.toPandas()
print(encoded_pd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment