Skip to content

Instantly share code, notes, and snippets.

View jnothman's full-sized avatar

Joel Nothman jnothman

  • Canva
  • Sydney
View GitHub Profile
@jnothman
jnothman / averaging.py
Created July 22, 2014 02:28
Illustration of P/R/F1 averaging methods
from __future__ import print_function
import numpy as np
from sklearn.metrics import precision_recall_fscore_support as prfs, confusion_matrix
from sklearn.preprocessing import label_binarize
true = [0, 0, 0, 1, 1, 2]
preds = [('under-generate 1', [0, 0, 0, 0, 1, 2]),
('under-generate 2', [0, 0, 0, 1, 1, 0]),
('over-generate 1', [0, 1, 1, 1, 1, 2]),
('confuse 1 and 2', [0, 0, 0, 1, 2, 1])]
@jnothman
jnothman / list-json-paths.py
Created September 5, 2014 05:16
Extract and list json paths
#!/usr/bin/env python
"""
Faced with a collection of JSON blobs, this script lists what
paths (i.e. sequences of nested keys) exist in the data from
root to leaf.
For example:
$ echo '[{"a": {"a1": 124}, "b": 111}, {"a": {"a2": 111}, "c": null}]' \
| list-json-paths.py
will output:
@jnothman
jnothman / sklearn_param_trans.py
Created November 17, 2014 09:36
Allow nested scikit-learn params to be renamed, or multiple parameters tied to hold the same value
from abc import ABCMeta, abstractmethod
from .base import BaseEstimator
from .externals.six import iteritems, with_metaclass
class BaseParameterTranslator(with_metaclass(ABCMeta, BaseEstimator)):
@property
def fit(self):
@jnothman
jnothman / resamplers.py
Created November 27, 2014 13:31
examples of resamplers for scikit-learn
from __future__ import print_function, division
import numpy as np
from sklearn.base import BaseEstimator
from sklearn.cluster import MiniBatchKMeans, SpectralClustering
from sklearn.neighbors import KNeighborsClassifier
from sklearn.utils.random import sample_without_replacement
from sklearn.svm import OneClassSVM
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
@jnothman
jnothman / count
Created February 5, 2015 07:07
count or sum unix command
#!/usr/bin/env python
"""Count or sum, while uniquing rows, without full sort of data
By using --key-fields, can also show example row that has some particular fields.
(This was much simpler when it just counted!)
"""
import sys
import argparse
@jnothman
jnothman / quizzes-only.js
Created March 27, 2012 04:15
Links to show Coursera lecture quizzes only
/*
On the coursera lecture index, execute this Javascript (via greasemonkey script, bookmarklet, etc) to show 'quizzes' links next to each lecture. Clicking it will open the lecture, but instead of showing the video, will:
1. Pause it
2. Show the first quiz
3. Upon clicking "skip" or "continue" on the quiz, proceed to the next
4. Repeat from 3.
5. Continue showing the video
Having watched a downloaded video, you can now easily do the in-lecture quizzes separately.
@jnothman
jnothman / gitvimrc.vim
Created March 28, 2012 12:34
source a git repository-specific .vimrc
function! SourceGitVimrc(dir)
let gitroot = system("cd " . fnameescape(a:dir) . "; git rev-parse --show-toplevel 2>/dev/null")
" Strip trailing newline and escape
let gitroot = substitute(gitroot, "\\n*$","","")
if strlen(gitroot) && filereadable(gitroot . '/.vimrc')
execute "source " . fnameescape(gitroot) . '/.vimrc'
endif
endfunction
@jnothman
jnothman / renumber-opera-session.py
Created December 16, 2012 23:53
This script renumbers an edited Opera Browser session file. If you remove some tabs/windows from an existing autosave.win (or other .win) file, the numbering becomes non-contiguous. Pipe the edited session file through this script and the numbering will now count from 1 to the required number of windows. However, you must also modify the 'window…
import re, sys
num_re = re.compile(r'(?<=^\[)[0-9]+')
in_n = ''
out_n = 0
def sub_cb(match):
global in_n, out_n
@jnothman
jnothman / extended_scorer.py
Created April 16, 2013 13:55
More functionality in scikit-learn `Scorer`
from __future__ import print_function
from abc import ABCMeta, abstractmethod
from functools import partial
import numpy as np
from sklearn.metrics import precision_recall_fscore_support
from sklearn.base import BaseEstimator
"""
Tool to examine the output of model selection search results from scikit-learn (assuming #1787).
Pandas might be more appropriate, but I haven't worked out how to do group_best there...
For example:
>>> my_search = GridSearchCV(est, param_dict={'a': [...], 'b': [...], 'c': [...]})
>>> my_search.fit(X, y)
>>> rw = ResultsWrangler(my_search.grid_results_, my_search.fold_results_)
>>> grouped = rw.group_best(['a', 'b'])
>>> print(zip(grouped.parameters, grouped.scores))