Skip to content

Instantly share code, notes, and snippets.

View joaopcnogueira's full-sized avatar

João Paulo Nogueira joaopcnogueira

  • Fortaleza, Ceará
View GitHub Profile
@joaopcnogueira
joaopcnogueira / backward_elimination.py
Last active December 18, 2023 14:58
Feature selection by Backward Elimination using p-value
import numpy as np
import statsmodels.formula.api as sm
def backward_elimination(X, y, sl):
"""
X: the data matrix with the independent variables (predictors)
y: the matrix of the dependent variable (target)
sl: statistical level, by default the user should add 0.05 (5%)
"""
X = np.append(arr=np.ones((len(X),1)).astype(int), values=X, axis=1)
while(True):
@joaopcnogueira
joaopcnogueira / how_to_build_regressive_features.R
Last active November 3, 2023 02:35
Piece of code demonstrating how to build regressive features for machine learning modeling, such as the sum of sales for the last 3 months, the mean of sales for the last 6 months and so on.
# Creating a spine table with three columns ----
# customer_id: identifier of the customer, for which we are going to predict the next month sales
# year_month: reference date
# sales: the metric we want to predict
spine_tbl <- tibble(
customer_id = c(rep("João", 24), rep("Denise", 24)),
year_month = c( seq( ymd("2021-11-01"), ymd("2023-10-01"), by = '1 month' ), seq( ymd("2021-11-01"), ymd("2023-10-01"), by = '1 month' ) ),
sales = sample(100:1000, 48, replace = TRUE)
)
@joaopcnogueira
joaopcnogueira / backward_elimination2.py
Last active October 9, 2023 12:26
Feature selection by Backward Elimination using both the p-value and the adjusted r-squared
import numpy as np
import statsmodels.formula.api as sm
def backward_elimination2(X, y, sl):
"""
X: the data matrix with the independent variables (predictors)
y: the matrix of the dependent variable (target)
sl: statistical level, by default the user should add 0.05 (5%)
"""
X = np.append(arr=np.ones((len(X),1)).astype(int), values=X, axis=1)
@joaopcnogueira
joaopcnogueira / lpsolve.R
Last active November 15, 2021 15:44
Gist Example of Optimization and Linear Programming with R
# Import lpSolve package
library(lpSolve)
#
# Set up the problem: maximize
# z = 2*x1 + 11*x2 subject to
# 2*x1 + 2*x2 <= 20
# x1 + 2*x2 <= 12
# 3*x1 + 4*x2 <= 36
# x1 <= 5
@joaopcnogueira
joaopcnogueira / python_project.py
Created November 29, 2019 03:54
How to make Python Code Run Independent of OS Path
# Work Directory (/home/user/python_project/):
# - data
# - data/employee.csv
# - src
#
import os
# WORK_DIR="/home/user/python_project/"
WORK_DIR = os.getcwd()
@joaopcnogueira
joaopcnogueira / target_mean_encoder.R
Last active September 22, 2019 23:53
Target mean encoder implementation in R
library(dplyr)
# creating a toy dataset
data = tibble(vehicle = c("car", "bus", "bike", "bus", "car", "bike"),
target = c(23,34,56,78,33,65))
# print dataframe
data
# OUTPUT
@joaopcnogueira
joaopcnogueira / condaenv.txt
Created August 20, 2019 13:21 — forked from pratos/condaenv.txt
To package a conda environment (Requirement.txt and virtual environment)
# For Windows users# Note: <> denotes changes to be made
#Create a conda environment
conda create --name <environment-name> python=<version:2.7/3.5>
#To create a requirements.txt file:
conda list #Gives you list of packages used for the environment
conda list -e > requirements.txt #Save all the info about packages to your folder
@joaopcnogueira
joaopcnogueira / custom_groupby_functions.R
Last active July 19, 2019 17:01
Custom group_by function in R
library(tidyverse)
# toy dataset
df <- tibble(
clientes = c('joao', 'joao', 'joao', 'lucas', 'lucas', 'julia', 'julia', 'julia', 'julia'),
produtos = c('celular', 'notebook', 'livro', 'bola', 'carro', 'chapéu', 'moto', 'moto', 'caneta')
)
# função customizada
get_produtos <- function(produtos){
@joaopcnogueira
joaopcnogueira / custom_groupby_functions.py
Last active July 13, 2019 14:42
Custom groupby function
"""
Defining a custom function to be applied in pandas groupby
"""
import numpy as np
import pandas as pd
clients = ['joao', 'joao', 'joao', 'lucas', 'lucas', 'julia', 'julia', 'julia', 'julia']
products = ['smartphone', 'notebook', 'book', 'ball', 'car', 'hat', 'bike', 'mouse', 'pen']
@joaopcnogueira
joaopcnogueira / titanic-pipeline4.py
Last active July 11, 2019 21:02
Using Pipelines and ColumnTransform to compose different data pre-processing steps
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV
from sklearn.compose import ColumnTransformer