This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| mc$defaultLibrary <- "sparklyr" | |
| library(sparklyr) | |
| library(tidyverse) | |
| speeches <- magpie::sql(mc, "SELECT * FROM presidential_speeches WHERE president") | |
| partitions <- speeches %>% | |
| ft_tokenizer(input_col = 'speech_text', output_col = 'words') %>% | |
| ft_stop_words_remover(input_col = 'words', output_col = 'clean_words') %>% |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.ml import Pipeline | |
| from pyspark.ml.feature import VectorAssembler | |
| from pyspark.ml.regression import RandomForestRegressor | |
| from pyspark.ml.evaluation import RegressionEvaluator | |
| from pyspark.ml.tuning import ParamGridBuilder, CrossValidator | |
| import matplotlib.pyplot as plt | |
| import numpy as np | |
| # Pull in the data | |
| df = mc.sql("SELECT * FROM kings_county_housing") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.ml.feature import VectorAssembler | |
| feature_list = [] | |
| for col in df.columns: | |
| if col == 'label': | |
| continue | |
| else: | |
| feature_list.append(col) | |
| assembler = VectorAssembler(inputCols=feature_list, outputCol="features") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| cvModel = crossval.fit(trainingData) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| predictions = cvModel.transform(testData) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| (trainingData, testData) = df.randomSplit([0.8, 0.2]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| bestPipeline = cvModel.bestModel | |
| bestModel = bestPipeline.stages[1] | |
| importances = bestModel.featureImportances | |
| x_values = list(range(len(importances))) | |
| plt.bar(x_values, importances, orientation = 'vertical') | |
| plt.xticks(x_values, feature_list, rotation=40) | |
| plt.ylabel('Importance') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| df = mc.sql("SELECT * FROM kings_county_housing") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import matplotlib.pyplot as plt | |
| evaluator = RegressionEvaluator(labelCol="label", predictionCol="prediction", metricName="rmse") | |
| rmse = evaluator.evaluate(predictions) | |
| rfPred = model.transform(df) | |
| rfResult = rfPred.toPandas() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.ml import Pipeline | |
| pipeline = Pipeline(stages=[assembler, rf]) |
NewerOlder