Skip to content

Instantly share code, notes, and snippets.

View jcrudy's full-sized avatar

Jason Rudy jcrudy

View GitHub Profile
@jcrudy
jcrudy / nonparametric_pymc.py
Last active May 29, 2023 09:55
An example of using a kernel density estimate as a prior in a pymc model that can be updated based on the posterior sample.
from scipy.stats.kde import gaussian_kde
import pymc
from math import log
from matplotlib import pyplot
def KernelSmoothing(name, dataset, bw_method=None, lower=float('-inf'), upper=float('inf'), observed=False, value=None):
'''Create a pymc node whose distribution comes from a kernel smoothing density estimate.'''
density = gaussian_kde(dataset, bw_method)
lower_tail = 0
upper_tail = 0
@jcrudy
jcrudy / earth_vs_earth.py
Last active December 26, 2015 22:09
This is a comparison between the R package "earth" and a Python implementation of multivariate adaptive regression splines. It currently requires the pull request at https://github.com/scikit-learn/scikit-learn/pull/2285.
'''
=============================
Comparison with the R package
=============================
This script randomly generates earth-style models, then randomly generates data from those models and
fits earth models to those data using both the python (:class:`Earth`) and R implementations. It records the sample size,
m, the number of input dimensions, n, the number of forward pass iterations, the runtime, and the r^2
statistic for each fit and writes the result to a CSV file. This script requires pandas, rpy2, and a
@jcrudy
jcrudy / pyearth_with_sklearn.py
Created November 15, 2013 00:07
An example showing how to combine py-earth with two of scikit-learn's meta-regressors, AdaBoostRegressor and GridSearchCV.
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import r2_score
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV
from pyearth import Earth
import numpy as np
import pandas as pd
# Generate a data set
np.random.seed(1)
@jcrudy
jcrudy / pipeline.py
Last active December 28, 2015 11:29
This example shows how to use Earth objects in a Pipeline in an AdaBoostClassifier. It is necessary to modify scikit-learn a little to make this work. The modified file is included.
"""
The :mod:`sklearn.pipeline` module implements utilites to build a composite
estimator, as a chain of transforms and estimators.
"""
# Author: Edouard Duchesnay
# Gael Varoquaux
# Virgile Fritsch
# Alexandre Gramfort
# Licence: BSD
@jcrudy
jcrudy / simple_boosted_pipeline_example.py
Created December 2, 2013 19:45
A very simple example combining a Pipeline with an AdaBoostClassifier. This example does not work in the current version of scikit-learn.
from pyearth import Earth
from sklearn.pipeline import Pipeline
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC
from sklearn import datasets
import numpy as np
np.random.seed(1)
# Get data
X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)
import scala.util.continuations._
class Generator[A] extends Iterator[A] with (A => Unit @ suspendable) {
private var a: Option[A] = None
private var k: Option[Unit => Unit] = None
def next = {
val a0 = a.get
val k0 = k.get
a = None
@jcrudy
jcrudy / sampler_example.py
Last active June 9, 2021 08:49
Sampling survival times from arbitrary hazard functions
# Produce some simulated survival data from a weird hazard function
import numpy
from samplers import HazardSampler
# Set a random seed and sample size
numpy.random.seed(1)
m = 1000
# Use this totally crazy hazard function
hazard = lambda t: numpy.exp(numpy.sin(t) - 2.0)
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import create_engine, MetaData, Column, String, Integer
import pickle
from sqlalchemy.sql.schema import ForeignKey
# Create some tables in the database
engine = create_engine('sqlite://')
engine.execute('CREATE TABLE user (id INTEGER, name TEXT, favorite_color TEXT)')
engine.execute('CREATE TABLE profile (id INTEGER, userid INTEGER, summary TEXT)')
from sympy import Symbol, Add, Mul, Max, RealNumber
from sympy.utilities.codegen import codegen
vars = {'x0': Symbol('x0'),
'x1': Symbol('x1')}
coefficients = [.5, 1.4, 9.8]
terms = [RealNumber(1), Max(0, Add(vars['x0'], RealNumber(-3.5))),
Mul(Max(RealNumber(0), Add(vars['x0'], RealNumber(-3.5))), Max(RealNumber(0), Add(vars['x1'], RealNumber(-6))))]