This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This file manages pre-processing of raw traing and testing data sets for the | |
# Kaggle competion per https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/ | |
# It is intended to run from a command line in a batch mode, using the Rscript command below: | |
# Rscript --vanilla code/pre-processing.R data/wine.csv data/wine_test.csv data/train_imputed.csv data/test_imputed.csv | |
# 4 arguments are required | |
# - input file name for raw traing data csv | |
# - input file name for raw testing data csv, | |
# - output file name for imputed training data csv, | |
# - output file name for imputed testing data csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Competition: https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/ | |
# This is a file to perform | |
# - Linear Regression (LR) model training | |
# - predition on the imputed testing set, using the fitted LR model | |
# - preparation of a Kaggle submission file | |
# It is intended to run from a command line in a batch mode, using the Rscript command below: | |
# Rscript --vanilla code/LF.R data/train_imputed.csv data/test_imputed.csv 0.7 826 data/submission.csv config.R | |
# 6 arguments are required | |
# - input file name for imputed training data csv, | |
# - input file name for imputed testing data csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Competition: https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/ | |
# This is a file to perform | |
# - GBM model training | |
# - predition on the imputed testing set, using the fitted GBM model (for regression problem, | |
# gaussian distribution used in GBM) | |
# - preparation of a Kaggle submission file | |
# It is intended to run from a command line in a batch mode, using the Rscript command like one below: | |
# Rscript --vanilla code/GBM.R data/train_imputed.csv data/test_imputed.csv 5000 5 4 25 output/submission.csv code/config.R | |
# | |
# 8 arguments are required |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Competition: https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/ | |
# This is a file to perform | |
# - xgboost model training (linear booster used) | |
# - predition on the imputed testing set, using the fitted xgboost model | |
# - preparation of a Kaggle submission file | |
# It is intended to run from a command line in a batch mode, using the Rscript command below: | |
# Rscript --vanilla code/xgboost.R data/train_imputed.csv data/test_imputed.csv 10 2 0.0001 1 data/xgboost_submission.csv code/config.R | |
# | |
# 8 arguments are required | |
# - input file name for imputed training data csv, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Competition: https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/ | |
# This is a file to perform | |
# - ensemble prediction based on 3 models fitted (LR, GBM, and xgboost) | |
# - preparation of a Kaggle submission file for the ensemble prediction | |
# It is intended to run from a command line in a batch mode, using the Rscript command below: | |
# Rscript --vanilla code/ensemble.R data/ensemble_submission.csv code/config.R | |
# | |
# 2 arguments are required | |
# - output file name for the result submission csv file (in a ready-for-Kaggle-upload format) | |
# - configuration file in R (setup of the ensemble implemented as R code module), which has to have |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This is a DVC-based script to manage machine-learning pipeline for a project per | |
# https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/ | |
mkdir R_DVC_GITHUB_CODE | |
cd R_DVC_GITHUB_CODE | |
# clone the github repo with the code | |
git clone https://github.com/gvyshnya/DVC_R_Ensemble | |
# initialize DVC |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Project/Competition: https://www.kaggle.com/c/web-traffic-time-series-forecasting/ | |
# Simple benchmark prediction with median (median by page, weekdays, and holidays) | |
# | |
# - You should insall Workalendar from its github repo directly | |
# >>> pip install git+https://github.com/novafloss/workalendar.git | |
import pandas as pd | |
import pandas.tseries.holiday as hol | |
import re |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Improve ensemble configuration | |
$ vi code/config.R | |
# Commit all the changes. | |
$ git commit -am "Updated weights of the models in the ensemble" | |
# Reproduce the ensemble prediction | |
$ dvc repro data/submission_ensemble.csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Competition: https://inclass.kaggle.com/c/pred-411-2016-04-u3-wine/ | |
# This is a configuration file to the entire solution | |
# LR.R specific settings | |
cfg_run_LR <- 1 # if set to 0, LR model will not fit, and its prediction will not be calculated in the batch mode | |
# GMB.R specific settings | |
cfg_run_GBM <- 1 # if set to 0, GBM model will not fit, and its prediction will not be calculated in the batch mode | |
# xgboost.R specific settings |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import xgboost as xgb | |
import numpy as np | |
from sklearn.datasets import load_digits | |
from sklearn.cross_validation import train_test_split | |
rng = np.random.RandomState(1994) | |
digits = load_digits(2) | |
X = digits['data'] | |
y = digits['target'] |
OlderNewer