Skip to content

Instantly share code, notes, and snippets.

Olivier Grisel ogrisel

Block or report user

Report or block ogrisel

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View conda_forge_compilers_macos_buildlog.txt
(conda-forge-compilers) 0 [~/code/scikit-learn (master)]$ pip install -e . -v
Created temporary directory: /private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-ephem-wheel-cache-cn0u3xn5
Created temporary directory: /private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-req-tracker-7xtixh31
Created requirements tracker '/private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-req-tracker-7xtixh31'
Created temporary directory: /private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-install-q8mggn78
Obtaining file:///Users/ogrisel/code/scikit-learn
Added file:///Users/ogrisel/code/scikit-learn to build tracker '/private/var/folders/69/7jxl92h50w10b4v998qt4tj00000gn/T/pip-req-tracker-7xtixh31'
Running setup.py (path:/Users/ogrisel/code/scikit-learn/setup.py) egg_info for package from file:///Users/ogrisel/code/scikit-learn
Running command python setup.py egg_info
running egg_info
View halving_adult_census.py
from time import time
from pprint import pprint
import numpy as np
import pandas as pd
from scipy.stats import expon, randint, uniform
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder
@ogrisel
ogrisel / debug_hist_gbdt_missing_values.ipynb
Last active Jul 18, 2019
debug missing values for hist GBDT
View debug_hist_gbdt_missing_values.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ogrisel
ogrisel / ms-python-server.log
Created Jan 9, 2019
Microsoft Python Language Server version 0.1.75.0 on scikit-learn
View ms-python-server.log
Starting Microsoft Python language server.
##########Linting Output - flake8##########
Microsoft Python Language Server version 0.1.75.0
Initializing for /opt/venvs/py37/bin/python
Loading files from /home/ogrisel/code/scikit-learn
Parsing document file:///home/ogrisel/code/scikit-learn/setup.py
Parse complete for file:///home/ogrisel/code/scikit-learn/setup.py at version -1
Analysis queued for file:///home/ogrisel/code/scikit-learn/setup.py
Parsing document file:///home/ogrisel/code/scikit-learn/conftest.py
Parse complete for file:///home/ogrisel/code/scikit-learn/conftest.py at version -1
@ogrisel
ogrisel / non_degenerate_mlp_gram.py
Last active Nov 23, 2018
Spectrum of the extended feature Gram matrix of an single hidden layer ReLU MLP
View non_degenerate_mlp_gram.py
"""Empirical evaluation of the extended feature Gram matrix of a ReLU MLP
Here we try to estimate the spectrum of the H^\infty matrix as defined in:
Gradient Descent Provably Optimizes Over-parameterized Neural Networks (2018)
Simon S. Du, Xiyu Zhai, Barnabas Poczos, Aarti Singh
https://arxiv.org/abs/1810.02054
Theorem 4.1 relies on the assumption that H^\infty has a strictly positive
minimum eigenvalue. The following computes an estimate of this eigenvalue
View kmeans_benchmark.py
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.externals import joblib
m = joblib.Memory(cachedir='/tmp/joblib')
make_blobs = m.cache(make_blobs)
data, labels = make_blobs(n_samples=10**5, n_features=50, cluster_std=100,
centers=10, random_state=777)
@ogrisel
ogrisel / numpy_pickle_protocol_5.py
Last active Oct 13, 2019
Draft use of pickle protocol 5 (PEP 574) for zero-copy numpy array pickling
View numpy_pickle_protocol_5.py
from pickle import Pickler, load
try:
from pickle import PickleBuffer
except ImportError:
PickleBuffer = None
import copyreg
import os
import numpy as np
import time
@ogrisel
ogrisel / large_pickle_dump.py
Last active Apr 20, 2018
Memory profiling for Python pickling of large buffers
View large_pickle_dump.py
from pickle import Pickler, _Pickler, Unpickler, _Unpickler, HIGHEST_PROTOCOL
import os
import time
import sys
import gc
from multiprocessing import get_context
PROTOCOL = HIGHEST_PROTOCOL
ctx = get_context('spawn')
View worker_log.txt
distributed.worker - WARNING - Worker at 72 percent memory usage. Trigger GC. Process memory: 723.75 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 66 percent memory usage. After GC. Process memory: 660.93 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 73 percent memory usage. Trigger GC. Process memory: 732.79 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 73 percent memory usage. After GC. Process memory: 732.79 MB -- Worker memory limit: 1000.00 MB
distributed.core - WARNING - Event loop was unresponsive for 1.01s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.worker - WARNING - Worker at 70 percent memory usage. Trigger GC. Process memory: 705.26 MB -- Worker memory limit: 1000.00 MB
distributed.worker - WARNING - Worker at 67 percent memory usage. After GC. Process memory: 670.00 MB -- Worker memory limit: 1000.0
@ogrisel
ogrisel / mean_target_encoding.py
Last active Jul 7, 2018
Mean target value encoding for categorical variable using dask (take 2)
View mean_target_encoding.py
import os
import os.path as op
from time import time
import dask.dataframe as ddf
import dask.array as da
from distributed import Client
def make_categorical_data(n_samples=int(1e7), n_features=10, n_partitions=100):
"""Generate some random categorical data
You can’t perform that action at this time.