Skip to content

Instantly share code, notes, and snippets.

dat = data.frame(
userID=c("one", "two", "three", "four", "five"),
start_date=as.Date(c("2015-09-01", "2015-09-02", "2015-09-02", "2015-09-03", "2015-09-03")),
end_date=as.Date(c("2015-09-02", NA, "2015-09-03", NA, "2015-09-03")))
n_start = 100 # number of active users on day zero
days = as.Date("2015-08-25")+1:10
n_new = sapply(days, function(x)length(which(dat$start_date == x)))
#!/usr/bin/env python3
lst = [{'one':1,'two':2,'three':3}, {'one':100,'two':200,'three':300}]
def wrapper(x, fun):
return fun(x)
def this_works():
def local_inner(d):
return d[key]
@infotroph
infotroph / unmathsub.py
Last active September 22, 2015 17:33
Pandoc filter: convert inline math subscripts to text sbscripts.
#!/usr/bin/env python3
'''
Pandoc filter to convert inline math subscripts to text sbscripts.
Written for a very specific problem:
Bibtex entries with "CO_{2}" are rendered by the Pandoc parser as
[Str "CO",Math InlineMath "_{2}"],
which is then rendered in OOXML as an inline equation that looks like
"CO 2", with the 2 subscripted but an empty equation field between the subscript and the previous letters.
This filter solves this problem by replacing
@infotroph
infotroph / soft-censor.R
Created November 17, 2015 20:07
Is "soft" censoring a thing?
# Testing a soft-left-censored Stan model:
# Values above some minimum are detected normally,
# values less than minimum go detected with some probability > 0.
set.seed(2345767)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = 7)
sim_mu = 3
library(ggplot2)
library(dplyr)
library(zoo) # for rollmean()
d_mean = (diamonds
%>% mutate(price=round(price, -2))
%>% group_by(price)
%>% mutate(mean_x = mean(x)))
d_roll = (diamonds
@infotroph
infotroph / sim_logis.R
Last active November 30, 2015 20:22
This is a visual approach to evaluating whether my logistic regression estimates are close to the simulated values. Can I instead compute scale and location directly from glm estimates?
set.seed(254469)
n=100
xlo=0
xhi=20
loc_logis=5
scale_logis=3
x = runif(n, xlo, xhi)
p_detect = plogis(x, location=loc_logis, scale=scale_logis)
@infotroph
infotroph / sim_mixdist.R
Created December 14, 2015 22:04
Simulating data from a mixed distribution of zeroes (nondetection) or lognormal positive values (root volume when we detect any)
sim_mixdist = function(
n_tubes=1,
depths=1:3,
intercept=1, # E[y] at depth=0 (log scale)
b_depth=-1, # slope for depth (log scale)
sig_tube=1, # sd for N(0) tube offsets (log scale)
detect_loc=1, # intercept for detection logistic. (scale... same as mu?)
detect_scale=1, # slope for detection logistic. (scale?)
sigma=1){ # residual (log scale)
@infotroph
infotroph / tocsv.sh
Last active December 16, 2015 08:58
Sed example for concatenating multiple space-delimited files, with varying kinds of messy header line, into single CSV files.
#!/bin/bash
trtary=(a b c)
echo -e '1\tfoo.out\n0\tunwanted.out\n1\tbar.out\n1\tbaz.out' > tocsv.files
for t in ${trtary[*]}; do
# generate sample data. Each treatment overwrites previous .out files.
echo -e 'col1 col2 col3\n1 2 3\n4 5 6\n7 8 9' > foo.out
echo -e 'extrajunk\ncol1 col2 col3 col4\n1 2 3 4\n5 6 7 8\n9 10 11 12' > bar.out
@infotroph
infotroph / git-MSOffice-diff
Last active December 17, 2015 02:28
A quick and dirty approach to make git-diff on MS Office documents use the decompressed XML instead of treating them as binary. Known bugs: Complains on empty components (notably including the always-present [Content_Types].xml), and actually shows all changes to the XML (so good luck finding meaningful changes to an Excel file under all the dis…
# /usr/local/bin/xmlzip.sh:
#!/bin/bash
for i in `unzip -Z -1 "$1"`; do
echo "$i"
unzip -a -p "$1" "$i" | xmllint --format -
done
# ~/.gitconfig:
[diff "xmlzip"]
#Exim filter
logfile $home/eximfilter.log
logwrite "orig_local: $original_local_part"
if error_message then
logwrite "Error message; not filtering"
finish
endif