Skip to content

Instantly share code, notes, and snippets.

Avatar

João Paulo Nogueira joaopcnogueira

View GitHub Profile
@joaopcnogueira
joaopcnogueira / python_project.py
Created Nov 29, 2019
How to make Python Code Run Independent of OS Path
View python_project.py
# Work Directory (/home/user/python_project/):
# - data
# - data/employee.csv
# - src
#
import os
# WORK_DIR="/home/user/python_project/"
WORK_DIR = os.getcwd()
@joaopcnogueira
joaopcnogueira / condaenv.txt
Created Aug 20, 2019 — forked from pratos/condaenv.txt
To package a conda environment (Requirement.txt and virtual environment)
View condaenv.txt
# For Windows users# Note: <> denotes changes to be made
#Create a conda environment
conda create --name <environment-name> python=<version:2.7/3.5>
#To create a requirements.txt file:
conda list #Gives you list of packages used for the environment
conda list -e > requirements.txt #Save all the info about packages to your folder
@joaopcnogueira
joaopcnogueira / target_mean_encoder.R
Last active Sep 22, 2019
Target mean encoder implementation in R
View target_mean_encoder.R
library(dplyr)
# creating a toy dataset
data = tibble(vehicle = c("car", "bus", "bike", "bus", "car", "bike"),
target = c(23,34,56,78,33,65))
# print dataframe
data
# OUTPUT
@joaopcnogueira
joaopcnogueira / titanic-pipeline4.py
Last active Jul 11, 2019
Using Pipelines and ColumnTransform to compose different data pre-processing steps
View titanic-pipeline4.py
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV
from sklearn.compose import ColumnTransformer
@joaopcnogueira
joaopcnogueira / titanic-pipeline3.py
Last active Jul 11, 2019
GridSearchCV with pipelines
View titanic-pipeline3.py
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV
@joaopcnogueira
joaopcnogueira / titanic-pipeline2.py
Last active Jul 11, 2019
K-fold cross-validation with pipeline
View titanic-pipeline2.py
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
# lendo o dataset
@joaopcnogueira
joaopcnogueira / titanic-pipeline.py
Last active Jul 11, 2019
Refactored titanic code with pipelines
View titanic-pipeline.py
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from category_encoders import OneHotEncoder
# lendo o dataset
df = pd.read_csv("train.csv")
@joaopcnogueira
joaopcnogueira / groupkfold_example.py
Last active Jul 11, 2019
Simple Example using GroupKFold with Cross-Validate
View groupkfold_example.py
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GroupKFold
# Loading the data
iris = datasets.load_iris()
design_matrix = np.concatenate((iris['data'], iris['target'].reshape(150,1)), axis=1)
@joaopcnogueira
joaopcnogueira / custom_groupby_functions.R
Last active Jul 19, 2019
Custom group_by function in R
View custom_groupby_functions.R
library(tidyverse)
# toy dataset
df <- tibble(
clientes = c('joao', 'joao', 'joao', 'lucas', 'lucas', 'julia', 'julia', 'julia', 'julia'),
produtos = c('celular', 'notebook', 'livro', 'bola', 'carro', 'chapéu', 'moto', 'moto', 'caneta')
)
# função customizada
get_produtos <- function(produtos){
View custom_groupby_functions.py
"""
Defining a custom function to be applied in pandas groupby
"""
import numpy as np
import pandas as pd
clients = ['joao', 'joao', 'joao', 'lucas', 'lucas', 'julia', 'julia', 'julia', 'julia']
products = ['smartphone', 'notebook', 'book', 'ball', 'car', 'hat', 'bike', 'mouse', 'pen']
You can’t perform that action at this time.