Skip to content

Instantly share code, notes, and snippets.

View rnowling's full-sized avatar

RJ Nowling rnowling

View GitHub Profile
WARN [2015-03-19 16:31:26,277] ({main} ZeppelinConfiguration.java[create]:76) - Failed to load configuration, proceeding with a default
INFO [2015-03-19 16:31:26,493] ({main} NotebookServer.java[creatingwebSocketServerLog]:41) - Create zeppelin websocket on port 8081
INFO [2015-03-19 16:31:26,907] ({main} ZeppelinServer.java[main]:84) - Start zeppelin server
INFO [2015-03-19 16:31:26,910] ({main} Server.java[doStart]:272) - jetty-8.1.14.v20131031
INFO [2015-03-19 16:31:27,610] ({main} InterpreterFactory.java[init]:86) - Reading /home/vagrant/zeppelin/interpreter/spark
INFO [2015-03-19 16:31:28,326] ({main} InterpreterFactory.java[init]:103) - Interpreter spark found. class=com.nflabs.zeppelin.spark.SparkInterpreter
INFO [2015-03-19 16:31:28,333] ({main} InterpreterFactory.java[init]:103) - Interpreter pyspark found. class=com.nflabs.zeppelin.spark.PySparkInterpreter
INFO [2015-03-19 16:31:28,335] ({main} InterpreterFactory.java[init]:103) - Interpreter sql found. class=com.nflabs.zeppelin.spark.SparkSq
@rnowling
rnowling / gluster-tasks.yml
Last active August 29, 2015 14:19
Ansible Playbooks for Gluster
- hosts: storage_nodes
name: Gluster configuration
sudo: true
vars:
- gluster_brick_dirs:
- /srv/gluster/brick1
- /srv/gluster/brick2
- /srv/gluster/brick3
- /srv/gluster/brick4
- /srv/gluster/brick5
@rnowling
rnowling / rf_correlation_bias.py
Created August 12, 2015 02:17
RF Feature Correlation Bias
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
N_SAMPLES = 10
N_TREES = 100
MAX_CATEGORIES = 32
"""
Script for comparing Logistic Regression with L1, L2, and elastic net regularization and the liblinear, sag, and sgd optimizers. You'll need to download a copy of the dataset from http://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html .
Copyright 2016 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
@rnowling
rnowling / optics.py
Created October 13, 2015 14:36
Customer Segmentation Pipeline Prototype
"""
Copyright 2015 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
@rnowling
rnowling / test_import.bats
Last active February 5, 2017 03:52
Test example for Bats
#!/usr/bin/env bats
count_snps() {
local counts=`python -c "import cPickle; data=cPickle.load(open('${1}/snp_feature_indices')); print len(data)"`
echo "$counts"
}
setup() {
N_INDIVIDUALS=20
N_SNPS=10000
@rnowling
rnowling / binomial_test_window_analysis.py
Last active January 28, 2018 22:51
Binomial Test for Identifying Regions of Enriched Differentiation
"""
Copyright 2017 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
@rnowling
rnowling / rf_missing_data.py
Created December 16, 2015 01:45
Imputing Missing Data and Random Forest Variable Importance Scores
from collections import defaultdict
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import mstats
N_SAMPLES = 100
"""
Script for comparing spam classification with a bag-of-words model constructed with and without hashing. You'll need to download a copy of the dataset from http://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html .
Copyright 2016 Ronald J. Nowling
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
@rnowling
rnowling / rf_bias.py
Last active June 10, 2019 10:33
Simulate RF Categorical Variable Encoding Bias
import random
import sys
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
N_SAMPLES = 1000
N_TREES = 100
MAX_CATEGORIES = 32