Skip to content

Instantly share code, notes, and snippets.

@lakshay-arora
Created November 4, 2019 07:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lakshay-arora/b6f285c7bfe9424d6bed1ccb65f9be10 to your computer and use it in GitHub Desktop.
Save lakshay-arora/b6f285c7bfe9424d6bed1ccb65f9be10 to your computer and use it in GitHub Desktop.
from pyspark.ml.classification import LogisticRegression
# create a sample dataframe with 4 features and 1 label column
sample_data_train = spark.createDataFrame([
(2.0, 'A', 'S10', 40, 1.0),
(1.0, 'X', 'E10', 25, 1.0),
(4.0, 'X', 'S20', 10, 0.0),
(3.0, 'Z', 'S10', 20, 0.0),
(4.0, 'A', 'E10', 30, 1.0),
(2.0, 'Z', 'S10', 40, 0.0),
(5.0, 'X', 'D10', 10, 1.0),
], ['feature_1', 'feature_2', 'feature_3', 'feature_4', 'label'])
# view the data
sample_data_train.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment