Skip to content

Instantly share code, notes, and snippets.

View ftvalentini's full-sized avatar

Francisco Valentini ftvalentini

View GitHub Profile
@ftvalentini
ftvalentini / outlier_masking.R
Created June 28, 2022 16:47
Example of _masking_ of outliers when using z-score
set.seed(33)
x = runif(100, min=0, max=1)
outlier_values = c(1e2, 1e3, 1e4, 1e5, 1e6)
for (i in outlier_values) {
x[100] = i
y = (x - mean(x)) / (sd(x))
print(y[100])
}
@ftvalentini
ftvalentini / embeddings_onehot.py
Created March 7, 2021 22:58
Embeddings in simple NN as matrix multiplication
import numpy as np
n = 1 # number of observations
k = 5 # number of categories
c = 1 # 2da categoria
dim = 3 # dimension of embeddings
E = np.random.rand(k, dim) # embedding matrix
x_onehot = np.full((k, n), 0)
@ftvalentini
ftvalentini / sigmoid_vs_softmax_nn.py
Created January 17, 2021 23:53
Sigmoid vs. Softmax activation
import numpy as np
import torch
import torch.nn.functional as F
torch.manual_seed(99)
# 5 features, 3 clases, 1 ejemplo
X = torch.randn(5)
W1 = torch.randn(3,5)
W2 = W1.detach().clone() # una matriz de perdida por loss function
@ftvalentini
ftvalentini / scale_robust.R
Created September 27, 2020 23:18
Normalizing variables with robust and non-robust methods
z_scale = function(x) (x - mean(x)) / (sd(x))
rob_scale = function(x) (x - median(x)) / (IQR(x))
set.seed(12)
x = runif(10)
y = x
y[10] = 10 # datos con outlier
# sin outlier
x_zscale = z_scale(x)
@ftvalentini
ftvalentini / cumsum_array_axes.py
Created September 22, 2020 19:45
Cumulative sum of array along 2 axes
import numpy as np
A = np.arange(12).reshape(4,3)
# opcion 1
res1 = np.apply_over_axes(np.cumsum, A, axes=[0,1])
# opcion 2
res2 = A.cumsum(0).cumsum(1)
# opcion 3
res3 = np.cumsum( np.cumsum(A, axis=0), axis=1)
@ftvalentini
ftvalentini / topn_array.py
Last active September 13, 2020 23:53
Top n values and indices of np.array
import numpy as np
A = np.array([1,2,35,8])
n = 2
# idx of top n values (NOT SORTED)
idx_top = np.argpartition(A, -n)[-n:]
# sort idx from largest value to lowest
idx_top = idx_top[np.argsort(-A[idx_top])]
values_top = A[idx_top]
@ftvalentini
ftvalentini / lasso_vs_ridge.R
Created August 26, 2020 15:15
Coefficients of highly correlated covariates -- Lasso vs Ridge
# simulate highly correlated x1-x2
set.seed(8)
X = MASS::mvrnorm(
n = 100
, mu = c(0, 0)
, Sigma = matrix(c(1, 0.99, 0.99, 1), nrow=2, byrow=T)
, empirical = F
)
# simulate y
df = data.frame(
@ftvalentini
ftvalentini / rank_array.py
Last active August 25, 2020 04:03
Find "rank" of array elements
import numpy as np
import scipy.stats as stats
arr = np.array([0,2,3,0,236,8])
arr_ranks1 = np.searchsorted(np.sort(arr), arr)
arr_ranks2 = stats.rankdata(arr, "average")
arr_ranks3 = stats.rankdata(arr, "average") / len(arr)
# when using yield in a function, you are creating a generator object
def cuadrado():
for i in range(10):
yield i**2
for num in cuadrado():
print(num)