Skip to content

Instantly share code, notes, and snippets.

View rnowling's full-sized avatar

RJ Nowling rnowling

View GitHub Profile
@rnowling
rnowling / binomial_test_window_analysis.py
Last active January 28, 2018 22:51
Binomial Test for Identifying Regions of Enriched Differentiation
"""
Copyright 2017 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
@rnowling
rnowling / likelihood_ratio_test.py
Last active April 1, 2023 16:28
Likelihood-Ratio Test with scikit-learn and scipy
"""
Copyright 2017 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
@rnowling
rnowling / test_import.bats
Last active February 5, 2017 03:52
Test example for Bats
#!/usr/bin/env bats
count_snps() {
local counts=`python -c "import cPickle; data=cPickle.load(open('${1}/snp_feature_indices')); print len(data)"`
echo "$counts"
}
setup() {
N_INDIVIDUALS=20
N_SNPS=10000
"""
Script for comparing spam classification with a bag-of-words model constructed with and without hashing. You'll need to download a copy of the dataset from http://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html .
Copyright 2016 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
Script for comparing Logistic Regression with L1, L2, and elastic net regularization and the liblinear, sag, and sgd optimizers. You'll need to download a copy of the dataset from http://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html .
Copyright 2016 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
@rnowling
rnowling / imbalanced_dataset_lr_comparison.py
Created August 27, 2016 06:25
Imbalanced Dataset Logistic Regression Model Comparison
"""
Script for comparing Logistic Regression and associated evaluation metrics on the imbalanced Media 6 Degrees dataset from the Doing Data Science book. You'll need to download a copy of the dataset from the GitHub repo: https://github.com/oreillymedia/doing_data_science .
Copyright 2016 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
@rnowling
rnowling / rf_missing_data.py
Created December 16, 2015 01:45
Imputing Missing Data and Random Forest Variable Importance Scores
from collections import defaultdict
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import mstats
N_SAMPLES = 100
@rnowling
rnowling / optics.py
Created October 13, 2015 14:36
Customer Segmentation Pipeline Prototype
"""
Copyright 2015 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
@rnowling
rnowling / rf_correlation_bias.py
Created August 12, 2015 02:17
RF Feature Correlation Bias
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
N_SAMPLES = 10
N_TREES = 100
MAX_CATEGORIES = 32
@rnowling
rnowling / rf_bias.py
Last active June 10, 2019 10:33
Simulate RF Categorical Variable Encoding Bias
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
N_SAMPLES = 1000
N_TREES = 100
MAX_CATEGORIES = 32