Skip to content

Instantly share code, notes, and snippets.

View amueller's full-sized avatar

Andreas Mueller amueller

View GitHub Profile
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@amueller
amueller / commits.py
Created October 26, 2018 19:26
list recent commits by author
from github import Github
gh = Github("SECRETKEY")
rep = gh.get_repo("scikit-learn/scikit-learn")
org = gh.get_organization("scikit-learn")
org_members = list(org.get_members())
import datetime
n_commits = {}
limit = datetime.datetime(2017, 1, 1)
@amueller
amueller / parsing_in_preparation.ipynb
Created September 28, 2018 16:29
parsing in preparation datasets on openml
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import cvxpy as cvx
n_students = 130
n_projects = 30
assignment = cvx.Int(rows=n_students, cols=n_projects)
import numpy as np
rng = np.random.RandomState(0)
project_preferences = rng.rand(n_students, n_projects)
@amueller
amueller / tree_plotting.py
Created February 15, 2018 21:29
Stand-alone matplotlib based tree plotting from https://github.com/scikit-learn/scikit-learn/pull/9251
import numpy as np
from numbers import Integral
from sklearn.externals import six
from sklearn.tree.export import _color_brew, _criterion, _tree
def plot_tree(decision_tree, max_depth=None, feature_names=None,
class_names=None, label='all', filled=False,
leaves_parallel=False, impurity=True, node_ids=False,
@amueller
amueller / bench_feat_agg.py
Created October 27, 2017 17:37
bench feature agglomeration
"""
Benchmarks np.bincount method vs np.mean for feature agglomeration in
../sklearn/cluster/_feature_agglomeration. Use of np.bincount provides
a significant speed up if the pooling function is np.mean.
np.bincount performs better especially as the size of X and n_clusters
increase.
"""
import matplotlib.pyplot as plt
import numpy as np
class Formatter(object):
def __init__(self, indent_est='step'):
self.indent_est = indent_est
self.types = {}
self.htchar = ' '
self.lfchar = '\n'
self.indent = 0
self.step = 4
self.width = 79
self.set_formater(object, self.__class__.format_object)
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import numpy as np
import matplotlib.pyplot as plt
class Curve(object):
def __init__(self, scores, to="B+", std_adjust=0):
self.to = to
self.scores = scores
self.letters = ["A+", "A", "A-", "B+", "B", "B-", "C+", "C", "C-", "D", "F"]
idx = self.letters.index(to)
# +3 is because we do D and F manually