Skip to content

Instantly share code, notes, and snippets.

View MaxHalford's full-sized avatar

Max Halford MaxHalford

View GitHub Profile
@MaxHalford
MaxHalford / random_permutation.go
Last active October 26, 2016 16:36
Sampling k numbers in range [a, b) without replacement - Two different ways
package main
import "fmt"
// Sample k unique integers in range [min, max) by generating
// random numbers between min and max and then taking the first
// k numbers.
// The downside to this algorithm is a temporary
// list has to be kept in memory to store all the number from
// min to max.
'''
Le principe de cet algorithme est de parcourir les utilisateurs en "chaînant"
leurs attributs. Chaque utilisateur est défini par un tuple de valeurs, par
exemple un utilisateur peut être défini par un tupe (email, téléphone).
L'algorithme parcourt les utilisateurs de façon récursive. On commence avec le
premier utilisateur. On regarde alors son premier attribut (l'email par
exemple, ca n'a pas d'importance). On va alors chercher tous les utilisateurs
qui possède le même attribut et va alors faire exactement pareil pour cet
utilisateur. A chaque que l'on étudie un utilisateur on peut le retirer de la
import random
STATES = (0, 1, 2, 3, 4, 5)
ACTION_SET = {
0: (0, 4),
1: (1, 3, 5),
2: (2, 3),
3: (1, 2, 3, 4),
@MaxHalford
MaxHalford / minibatch.py
Created November 14, 2016 16:21
Reading files in minibatches
def get_minibatch(stream, size):
"""A minibatch is a stream slice."""
return [doc for doc in itertools.islice(stream, size)]
def iter_minibatches(stream, minibatch_size):
"""Generator of minibatches."""
minibatch = self.get_minibatch(stream, minibatch_size)
while len(minibatch):
yield minibatch
@MaxHalford
MaxHalford / top_terms.py
Last active November 21, 2016 20:19
Top terms classifier with the sklearn API
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.multiclass import unique_labels
from sklearn.utils.validation import check_X_y
class TopTermsClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, n_terms=10):
self.n_terms = n_terms
@MaxHalford
MaxHalford / gbudget.py
Created December 1, 2016 22:18
Group budget balancing
import pandas as pd
# Choose the file to balance
csvFile = 'example'
# Clear the log
open('log', 'w').close()
# Open the csv file
df = pd.read_csv('data/{0}.csv'.format(csvFile))
# Obtain sum of amount paid
@MaxHalford
MaxHalford / add_user.sh
Last active July 26, 2017 15:23
Add user to a server
sudo -i
groupadd mhalford
useradd -g mhalford -G sudo -s /bin/bash -m -d /home/mhalford mhalford
cd /home/mhalford
mkdir .ssh
echo COPY_PASTE_PUBLIC_KEY_HERE >> .ssh/authorized_keys
cd ..
chown -R mhalford:mhalford mhalford
@MaxHalford
MaxHalford / ratio_combos.py
Created October 31, 2017 17:24
Ratio combinations
import numpy as np
combos = []
def generate(remainder, n, current_combo=[], i=0, step=0.1):
if n == 1:
combos.append(np.round(current_combo + [remainder], 1))
return
for p in np.arange(step, remainder - (n-2) * step - step / 10, step):
@MaxHalford
MaxHalford / haar_wavelet.py
Created October 31, 2017 18:35
Haar wavelet
import math
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
# Generate some data
a = 1.99
@MaxHalford
MaxHalford / setup.sh
Created April 7, 2018 12:33
AWS Kaggle setup
curl -O https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
bash Anaconda3-5.1.0-Linux-x86_64.sh # Say "yes" and "yes"
anaconda3/bin/pip install kaggle
anaconda3/bin/pip install lightgbm
mkdir .kaggle
echo "{"username":"maxhalford", "key":"secret"}" >> ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json