Skip to content

Instantly share code, notes, and snippets.

@bgweber
Last active January 21, 2019 01:59
Show Gist options
  • Save bgweber/d71caada8a722db6c26da336b14c6839 to your computer and use it in GitHub Desktop.
Save bgweber/d71caada8a722db6c26da336b14c6839 to your computer and use it in GitHub Desktop.
from pyspark.ml.feature import VectorAssembler
# convert to a Spark data frame
boston_sp = spark.createDataFrame(boston_pd)
display(boston_sp.take(5))
# split into training and test spark data frames
boston_train = spark.createDataFrame(boston_pd[:400])
boston_test = spark.createDataFrame(boston_pd[400:])
# convert to vector representation for MLlib
assembler = VectorAssembler(inputCols= boston_train.schema.names[:(boston_pd.shape[1] - 1)],
outputCol="features" )
boston_train = assembler.transform(boston_train).select('features', 'target')
boston_test = assembler.transform(boston_test).select('features', 'target')
display(boston_train.take(5))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment