Skip to content

Instantly share code, notes, and snippets.

View carlleston's full-sized avatar

Anthony Carlleston de Lima carlleston

View GitHub Profile
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@carlleston
carlleston / 3-ln_model.ipynb
Last active August 23, 2020 12:09
pre-processing and linear model in pyspark
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@carlleston
carlleston / 2-ETL_news_Studio.ipynb
Created July 28, 2020 12:57
Extract, transform, classify the news and load.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus
import os
os.environ['PATH'] = os.environ['PATH']+';'+os.environ['CONDA_PREFIX']+r"\Library\bin\graphviz"
dot_data = StringIO()
export_graphviz(pipe.named_steps['regressor'].estimators_[0], out_file=dot_data)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
conda install pydotplus
conda install graphviz
regr = RandomForestRegressor(random_state = 100,bootstrap = True, max_depth=2,max_features=2,min_samples_leaf=3,min_samples_split=5,n_estimators=3)
pipe = Pipeline([
('scaler', StandardScaler()),
('reduce_dim', PCA()),
('regressor', regr)
])
pipe.fit(X_train,y_train)
ypipe=pipe.predict(X_test)
X = df[['GDP','UnemploymentRate']]
y = df['Revenue']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)