Skip to content

Instantly share code, notes, and snippets.

View johnmyleswhite's full-sized avatar

John Myles White johnmyleswhite

View GitHub Profile
counts <- read.csv('counts.csv', header = TRUE, sep = '\t')
counts <- transform(counts, logCount = log(Count))
ggplot(counts, aes(x = Length, y = logCount)) +
geom_point() +
geom_smooth(method = 'lm') +
xlab('Length of Entire Word') +
ylab('Log Number of Occurrences of Spelling') +
opts(title = 'How Many O\'s Does It Take to Make a GOL?')
@johnmyleswhite
johnmyleswhite / gist:3734935
Created September 17, 2012 00:11
Toy Regression
df <- data.frame(Fahrenheit = c(212, 32),
Celsius = c(100, 0))
lm.fit <- lm(Fahrenheit ~ Celsius, data = df)
summary(lm.fit)
predict(lm.fit, data.frame(Celsius = 40))
@johnmyleswhite
johnmyleswhite / gist:3734955
Created September 17, 2012 00:21
Toy Classification
df <- data.frame(IsSpam = c(1, 1, 0, 1),
MentionsViagra = c(0, 1, 0, 1),
MentionsNigeria = c(1, 1, 0, 0))
logistic.fit <- glm(IsSpam ~ MentionsViagra + MentionsNigeria,
data = df,
family = binomial(link = "logit"))
summary(logistic.fit)
@johnmyleswhite
johnmyleswhite / gist:3735101
Created September 17, 2012 01:28
Toy Regression w/ Errors
df <- data.frame(Fahrenheit = c(212, 102, 32),
Celsius = c(100, 50, 0))
lm.fit <- lm(Fahrenheit ~ Celsius, data = df)
summary(lm.fit)
predict(lm.fit, data.frame(Celsius = 40))
@johnmyleswhite
johnmyleswhite / gist:4195980
Created December 3, 2012 16:10
Imitating plyr and reshape in Julia
# A top priority for making DataFrames useful in Julia is the development of
# good documentation and a nice API for doing plyr+reshape style operations
# in Julia. This Gist is a draft of such documentation.
load("DataFrames")
using DataFrames
load("RDatasets")
baseball = RDatasets.data("plyr", "baseball")
@johnmyleswhite
johnmyleswhite / differentiate.jl
Last active November 19, 2021 15:23
Symbolic Differentiation in Julia
differentiate(x::Number, target::Symbol) = 0
function differentiate(s::Symbol, target::Symbol)
if s == target
return 1
else
return 0
end
end
@johnmyleswhite
johnmyleswhite / gist:4596783
Created January 22, 2013 18:11
Entering more variables into a linear regression and then checking the p-values for each is a bad thing to do.
library("ggplot2")
n.sims <- 100
max.n.vars <- 100
n.obs <- 100
res <- data.frame()
for (sim in 1:n.sims)
{
@johnmyleswhite
johnmyleswhite / gist:5222915
Created March 22, 2013 16:54
Comparing two ways of computing expectations in Julia
using Distributions
using Calculus
using Benchmark
function expectation(distr::Distribution,
g::Function,
epsilon::Real)
f = x -> pdf(distr, x)
endpoints = map(e -> quantile(distr, e), (epsilon, 1 - epsilon))
integrate(x -> f(x) * g(x), endpoints[1], endpoints[2])
@johnmyleswhite
johnmyleswhite / gist:5225361
Created March 22, 2013 22:51
Delegation macro
##############################################################################
#
# A macro for doing delegation
#
# This macro call
#
# @delegate MyContainer.elems [:size, :length, :ndims, :endof]
#
# produces this block of expressions
#
@johnmyleswhite
johnmyleswhite / gist:5248212
Created March 26, 2013 19:06
The Joys of Sparsity: Forward Stagewise Regression
# Generate (x, y) data with a sparse set of active predictors
# prob controls the frequency of predictors having zero effect
function simulate_date(n::Integer, p::Integer, prob::Real)
x = randn(n, p)
beta = randn(p)
for j in 1:p
if rand() < prob
beta[j] = 0.0
end
end