Skip to content

Instantly share code, notes, and snippets.

View MaxHalford's full-sized avatar

Max Halford MaxHalford

View GitHub Profile
'''
Le principe de cet algorithme est de parcourir les utilisateurs en "chaînant"
leurs attributs. Chaque utilisateur est défini par un tuple de valeurs, par
exemple un utilisateur peut être défini par un tupe (email, téléphone).
L'algorithme parcourt les utilisateurs de façon récursive. On commence avec le
premier utilisateur. On regarde alors son premier attribut (l'email par
exemple, ca n'a pas d'importance). On va alors chercher tous les utilisateurs
qui possède le même attribut et va alors faire exactement pareil pour cet
utilisateur. A chaque que l'on étudie un utilisateur on peut le retirer de la
import random
STATES = (0, 1, 2, 3, 4, 5)
ACTION_SET = {
0: (0, 4),
1: (1, 3, 5),
2: (2, 3),
3: (1, 2, 3, 4),
@MaxHalford
MaxHalford / random_permutation.go
Last active October 26, 2016 16:36
Sampling k numbers in range [a, b) without replacement - Two different ways
package main
import "fmt"
// Sample k unique integers in range [min, max) by generating
// random numbers between min and max and then taking the first
// k numbers.
// The downside to this algorithm is a temporary
// list has to be kept in memory to store all the number from
// min to max.
@MaxHalford
MaxHalford / lemmatization.py
Created November 14, 2016 16:17
Lemmatization
import nltk
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer
def tokenize(text):
text = ''.join([ch for ch in text if ch not in string.punctuation])
tokens = nltk.word_tokenize(text)
lemmatizer = WordNetLemmatizer()
return [lemmatizer.lemmatize(token) for token in tokens]
@MaxHalford
MaxHalford / minibatch.py
Created November 14, 2016 16:21
Reading files in minibatches
def get_minibatch(stream, size):
"""A minibatch is a stream slice."""
return [doc for doc in itertools.islice(stream, size)]
def iter_minibatches(stream, minibatch_size):
"""Generator of minibatches."""
minibatch = self.get_minibatch(stream, minibatch_size)
while len(minibatch):
yield minibatch
@MaxHalford
MaxHalford / top_terms.py
Last active November 21, 2016 20:19
Top terms classifier with the sklearn API
import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.multiclass import unique_labels
from sklearn.utils.validation import check_X_y
class TopTermsClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, n_terms=10):
self.n_terms = n_terms
@MaxHalford
MaxHalford / gbudget.py
Created December 1, 2016 22:18
Group budget balancing
import pandas as pd
# Choose the file to balance
csvFile = 'example'
# Clear the log
open('log', 'w').close()
# Open the csv file
df = pd.read_csv('data/{0}.csv'.format(csvFile))
# Obtain sum of amount paid
@MaxHalford
MaxHalford / fit.py
Created May 18, 2017 15:35
Keras fit/predict scikit-learn pipeline
import os
from keras import backend as K
from keras import callbacks
from keras import layers
from keras import models
from keras.wrappers.scikit_learn import KerasClassifier
import pandas as pd
import tensorflow as tf
from sklearn import metrics
@MaxHalford
MaxHalford / save_dataframe_to_dropbox.py
Created July 26, 2017 10:18
Save a pandas.DataFrame to Dropbox
import dropbox
def to_dropbox(dataframe, path, token):
dbx = dropbox.Dropbox(token)
df_string = dataframe.to_csv(index=False)
db_bytes = bytes(df_string, 'utf8')
dbx.files_upload(
@MaxHalford
MaxHalford / add_user.sh
Last active July 26, 2017 15:23
Add user to a server
sudo -i
groupadd mhalford
useradd -g mhalford -G sudo -s /bin/bash -m -d /home/mhalford mhalford
cd /home/mhalford
mkdir .ssh
echo COPY_PASTE_PUBLIC_KEY_HERE >> .ssh/authorized_keys
cd ..
chown -R mhalford:mhalford mhalford