Skip to content

Instantly share code, notes, and snippets.

View joaopcnogueira's full-sized avatar

João Paulo Nogueira joaopcnogueira

  • Fortaleza, Ceará
View GitHub Profile
@joaopcnogueira
joaopcnogueira / how_to_build_regressive_features.R
Last active November 3, 2023 02:35
Piece of code demonstrating how to build regressive features for machine learning modeling, such as the sum of sales for the last 3 months, the mean of sales for the last 6 months and so on.
# Creating a spine table with three columns ----
# customer_id: identifier of the customer, for which we are going to predict the next month sales
# year_month: reference date
# sales: the metric we want to predict
spine_tbl <- tibble(
customer_id = c(rep("João", 24), rep("Denise", 24)),
year_month = c( seq( ymd("2021-11-01"), ymd("2023-10-01"), by = '1 month' ), seq( ymd("2021-11-01"), ymd("2023-10-01"), by = '1 month' ) ),
sales = sample(100:1000, 48, replace = TRUE)
)
@joaopcnogueira
joaopcnogueira / lpsolve.R
Last active November 15, 2021 15:44
Gist Example of Optimization and Linear Programming with R
# Import lpSolve package
library(lpSolve)
#
# Set up the problem: maximize
# z = 2*x1 + 11*x2 subject to
# 2*x1 + 2*x2 <= 20
# x1 + 2*x2 <= 12
# 3*x1 + 4*x2 <= 36
# x1 <= 5
@joaopcnogueira
joaopcnogueira / python_project.py
Created November 29, 2019 03:54
How to make Python Code Run Independent of OS Path
# Work Directory (/home/user/python_project/):
# - data
# - data/employee.csv
# - src
#
import os
# WORK_DIR="/home/user/python_project/"
WORK_DIR = os.getcwd()
@joaopcnogueira
joaopcnogueira / condaenv.txt
Created August 20, 2019 13:21 — forked from pratos/condaenv.txt
To package a conda environment (Requirement.txt and virtual environment)
# For Windows users# Note: <> denotes changes to be made
#Create a conda environment
conda create --name <environment-name> python=<version:2.7/3.5>
#To create a requirements.txt file:
conda list #Gives you list of packages used for the environment
conda list -e > requirements.txt #Save all the info about packages to your folder
@joaopcnogueira
joaopcnogueira / target_mean_encoder.R
Last active September 22, 2019 23:53
Target mean encoder implementation in R
library(dplyr)
# creating a toy dataset
data = tibble(vehicle = c("car", "bus", "bike", "bus", "car", "bike"),
target = c(23,34,56,78,33,65))
# print dataframe
data
# OUTPUT
@joaopcnogueira
joaopcnogueira / titanic-pipeline4.py
Last active July 11, 2019 21:02
Using Pipelines and ColumnTransform to compose different data pre-processing steps
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV
from sklearn.compose import ColumnTransformer
@joaopcnogueira
joaopcnogueira / titanic-pipeline3.py
Last active July 11, 2019 17:41
GridSearchCV with pipelines
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV
@joaopcnogueira
joaopcnogueira / titanic-pipeline2.py
Last active July 11, 2019 14:55
K-fold cross-validation with pipeline
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
# lendo o dataset
@joaopcnogueira
joaopcnogueira / titanic-pipeline.py
Last active July 11, 2019 14:08
Refactored titanic code with pipelines
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
# lendo o dataset
df = pd.read_csv("train.csv")
@joaopcnogueira
joaopcnogueira / groupkfold_example.py
Last active July 11, 2019 13:53
Simple Example using GroupKFold with Cross-Validate
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GroupKFold
# Loading the data
iris = datasets.load_iris()
design_matrix = np.concatenate((iris['data'], iris['target'].reshape(150,1)), axis=1)