Skip to content

Instantly share code, notes, and snippets.

View amueller's full-sized avatar

Andreas Mueller amueller

View GitHub Profile
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@amueller
amueller / shuffle_once.ipynb
Created December 11, 2014 18:40
Shuffle once benchmarks
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@amueller
amueller / scipy_interpolation_weirdness.ipnb
Created February 27, 2015 15:42
scipy interpolation weirdness
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
@amueller
amueller / scipy_interpolation_weirdness.ipnb
Created February 27, 2015 15:47
scipy interpolation weirdness
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
@amueller
amueller / sklearn_tutorial_draft.rst
Last active August 29, 2015 14:16
scipy scikit-learn tutorial draft

Tutorial Topic

This tutorial aims to provide an introduction to machine learning and scikit-learn "from the ground up". We will start with basic concepts of machine learning and implementing these using scikit-learn. Going in detail through the characteristics of several methods, we will discuss how to pick an algorithm for your application, how to set its parameters, and how to evaluate performance.

Please provide a more detailed abstract of your tutorial (again, see last years tutorials).

Machine learning is the task of extracting knowledge from data, often with the goal to generalize to new, unseen data. Applications of machine learning now touch nearly every aspect of everyday life, from the face detection in our

@amueller
amueller / elkan_bench.py
Last active August 29, 2015 14:18
benching elkan k-means implementation
from sklearn.cluster import KMeans
from time import time
from sklearn.datasets import load_digits, fetch_mldata, load_iris, fetch_20newsgroups_vectorized
def bench_kmeans(data, n_clusters=5, init='random', n_init=1):
start = time()
km1 = KMeans(algorithm='lloyd', n_clusters=n_clusters, random_state=0, init=init, n_init=n_init).fit(X)
print("lloyd time: %f inertia: %f" % (time() - start, km1.inertia_))
start = time()
km2 = KMeans(algorithm='elkan', n_clusters=n_clusters, random_state=0, init=init, n_init=n_init).fit(X)
@amueller
amueller / magic_constructor_estimator.ipynb
Created April 14, 2015 00:33
No more double underscores in sklearn.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@amueller
amueller / knn_imputation_speed.ipynb
Created August 25, 2015 15:52
np.multiply test for knn imputation
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@amueller
amueller / kneighbors_weired.py
Created January 23, 2012 21:30
Weird kneibors behaviour
from sklearn import datasets, manifold
from sklearn.neighbors import NearestNeighbors
import numpy as np
n_points = 1000
n_neighbors = 10
out_dim = 2
n_trials = 100
@amueller
amueller / test_c.py
Created April 1, 2012 14:16
Testing influence of dataset size on C
import numpy as np
from sklearn import datasets
from sklearn.cross_validation import ShuffleSplit
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import Scaler
#data = datasets.load_digits()
data = datasets.fetch_mldata("usps")