Skip to content

Instantly share code, notes, and snippets.

@sanjurm16
Created January 27, 2019 18:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sanjurm16/9688fca71dd30911d9bae9f141184853 to your computer and use it in GitHub Desktop.
Save sanjurm16/9688fca71dd30911d9bae9f141184853 to your computer and use it in GitHub Desktop.
indexer = [StringIndexer(inputCol=column_name, outputCol=column_name+"Index", handleInvalid="keep") for column_name in column_to_index]
assembler = VectorAssembler(inputCols=["PclassIndex","SexIndex","Age","Fare","WithFamily","EmbarkedIndex"], outputCol="features")
labelindexer = StringIndexer(inputCol="Survived", outputCol="label")
sql_transformer = SQLTransformer(statement="SELECT features, label FROM __THIS__")
indexer.extend([assembler, labelindexer, sql_transformer])
lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
indexer.append(lr)
pipeline = Pipeline(stages=indexer)
model = pipeline.fit(trainingData)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment