Introductory quote:
"Machine learning people use hugely complex algorithms on trivially simple datasets. Biology does trivially simple algorithms on hugely complex datasets."
- Replicability
import numpy as np | |
import matplotlib.pyplot as plt | |
from sklearn.linear_model import Lasso, lars_path | |
np.random.seed(42) | |
def gen_data(n, m, k): | |
X = np.random.randn(n, m) | |
w = np.zeros((m, 1)) | |
i = np.arange(0, m) |
""" | |
Uses C++ map containers for fast dict-like behavior with keys being | |
integers, and values float. | |
""" | |
# Author: Gael Varoquaux | |
# License: BSD | |
# XXX: this needs Cython 17.1 or later. Elsewhere you will get a C++ compilation error. | |
import numpy as np |
import numpy as np | |
import time | |
from sklearn import cluster | |
from sklearn import datasets | |
lfw = datasets.fetch_lfw_people() | |
X_lfw = lfw.data[:, :5] | |
eps = 8. # This choice of EPS gives 44 clusters |
This gist is only meant for discussion. |
### Keybase proof | |
I hereby claim: | |
* I am GaelVaroquaux on github. | |
* I am gaelvaroquaux (https://keybase.io/gaelvaroquaux) on keybase. | |
* I have a public key whose fingerprint is 44B8 B843 6321 47EB 59A9 8992 6C52 6A43 ABE0 36FC | |
To claim this, I am signing this object: |
''' | |
Non-parametric computation of entropy and mutual-information | |
Adapted by G Varoquaux for code created by R Brette, itself | |
from several papers (see in the code). | |
This code is maintained at https://github.com/mutualinfo/mutual_info | |
Please download the latest code there, to have improvements and | |
bug fixes. |
import numpy as np | |
import matplotlib.pyplot as plt | |
from sklearn.linear_model import Ridge, Lasso | |
from sklearn.cross_validation import ShuffleSplit | |
from sklearn.grid_search import GridSearchCV | |
from sklearn.utils import check_random_state | |
from sklearn import datasets | |
"""Persistence strategies comparison script. | |
This script compute the speed, memory used and disk space used when dumping and | |
loading arbitrary data. The data are taken among: | |
- scikit-learn Labeled Faces in the Wild dataset (LFW) | |
- a fully random numpy array with 10000x10000 shape | |
- a dictionary with 1M random keys/values | |
- a list containing 10M random value | |
The compared persistence strategies are: |