Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Created November 3, 2019 19:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lakshay-arora/b93e77b0d35f47eb0945a4e5d8e64bb0 to your computer and use it in GitHub Desktop.
Save lakshay-arora/b93e77b0d35f47eb0945a4e5d8e64bb0 to your computer and use it in GitHub Desktop.
from pyspark.ml.feature import VectorAssembler
# specify the input and output columns of the vector assembler
assembler = VectorAssembler(inputCols=['Isboundary',
'Iswicket',
'Over',
'Runs',
'Batsman_Index',
'Bowler_Index',
'Batsman_OHE',
'Bowler_OHE'],
outputCol='vector')
# fill the null values
my_data = my_data.fillna(0)
# transform the data
final_data = assembler.transform(my_data)
# view the transformed vector
final_data.select('vector').show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment