Skip to content

Instantly share code, notes, and snippets.

View ajayborra's full-sized avatar

Ajay Borra ajayborra

View GitHub Profile
@ajayborra
ajayborra / buildingModel.scala
Last active November 5, 2018 20:00
Linear Regression Model
//linear regression model
val lr = new LinearRegression().setLabelCol("medianHouseValue").setFeaturesCol("scaledFeatures")
.setMaxIter(100)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Using Training set for model building
val lrModel = lr.fit(split(0))
// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
@ajayborra
ajayborra / trainTestSplit.scala
Created November 5, 2018 20:00
Train & Test Split
//divide in test and train
val split = scaledFeatures.randomSplit(Array(.8, .2))
@ajayborra
ajayborra / featureScalingOut.scala
Created November 5, 2018 19:59
Scaled Features Output
+----------------+--------------------+--------------------+
|medianHouseValue| features| scaledFeatures|
+----------------+--------------------+--------------------+
| 452600.0|[8.3252,41.0,880....|[2.34470895611761...|
| 358500.0|[8.3014,21.0,7099...|[2.33218146484030...|
| 352100.0|[7.2574,52.0,1467...|[1.78265621721384...|
| 341300.0|[5.6431,52.0,1274...|[0.93294490759373...|
| 342200.0|[3.8462,52.0,1627...|[-0.0128806838430...|
| 269700.0|[4.0368,52.0,919....|[0.08744451941137...|
| 299200.0|[3.6591,52.0,2535...|[-0.1113636089684...|
@ajayborra
ajayborra / featureScaling.scala
Created November 5, 2018 19:59
Feature Scaling
//Using Standard scaler to scale the feature set
val standardScaler = new StandardScaler()
.setInputCol("features")
.setOutputCol("scaledFeatures")
.setWithStd(true)
.setWithMean(true)
val scaler = standardScaler.fit(featuresDf)
val scaledFeatures = scaler.transform(featuresDf)
//print the output
scaledFeatures.show()
@ajayborra
ajayborra / createFeature.scala
Created November 5, 2018 19:57
Creating Features and Lables
// create features
val featureCols = Array("medianIncome", "housingMedianAge", "totalRooms", "totalBedrooms",
"population", "households", "latitude", "longitude")
val vectorAssembler = new VectorAssembler().setInputCols(featureCols).setOutputCol("features")
//create Label and Features
val featuresDf = vectorAssembler.transform(castedDF).select("medianHouseValue", "features")
//print label/feature
featuresDf.show()
@ajayborra
ajayborra / casting.scala
Last active November 5, 2018 20:34
Casting datatypes
// cast all the strings to Double type in Data Frame
val castedDF = df.columns.foldLeft(df)((current, c) =>
current.withColumn(c, current(c).cast(DoubleType)))
castedDF.describe().show()
@ajayborra
ajayborra / readDataset.scala
Created November 5, 2018 19:56
readDataset
//read the data as data frame
val df = spark.read.format("csv").option("header", "true")
.load("/path/to/file.csv")
@ajayborra
ajayborra / sparkContextSetup.scala
Created November 5, 2018 19:55
sparkContextSetup
// set up spark context
implicit val spark = SparkSession.builder.master("local")
.appName("California Housing Dataset Prediction").getOrCreate
@ajayborra
ajayborra / dataframeOutput.scala
Created November 5, 2018 19:53
DataFrame Output
// Displays the content of the DataFrame to stdout
df.show()
Output:
+-------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|houseId| medianHouseValue| medianIncome| housingMedianAge| totalRooms| totalBedrooms| population| households| latitude| longitude|
+-------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| 1|4.526000000000000...|8.325200000000000...|4.100000000000000...|8.800000000000000...|1.290000000000000...|3.220000000000000...|1.260000000000000...|3.788000000000000...|-1.22230000000000...|
| 2|3.585000000000000...|8.301399999999999...|2.100000000000000...|7.099000000000000...|1.106000000000000...|2.401000000000000...|1.138000000000000...|3.7