Skip to content

Instantly share code, notes, and snippets.

@infotroph
infotroph / gist:9479509
Created March 11, 2014 04:24
Keybase.md
### Keybase proof
I hereby claim:
* I am infotroph on github.
* I am infotroph (https://keybase.io/infotroph) on keybase.
* I have a public key whose fingerprint is E056 C0A5 5FB2 EA2C 897C 591A 19B3 5E7D 101C 0BEF
To claim this, I am signing this object:
@infotroph
infotroph / gist:9751447
Created March 24, 2014 23:19
No speed win from ddply
> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
user system elapsed
35.127 5.569 40.398
> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
user system elapsed
35.559 5.482 40.728
> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
user system elapsed
35.666 4.975 40.366
> system.time({a=ddply(raw, .(Img), strip.tracing.dups)})
@infotroph
infotroph / gist:58608607659a5ce7a989
Last active August 29, 2015 14:00
short-circuiting or (not?) in any()
> f <- function() { print('FALSE'); FALSE }
# infix logical operators short-circuit as expected
> TRUE || f()
[1] TRUE
> FALSE || f()
[1] "FALSE"
[1] FALSE
> TRUE && f()
[1] "FALSE"
@infotroph
infotroph / findstable.R
Last active August 29, 2015 14:04
Find changepoints from irregular timestamps
# Goal: Identify the rows of a time series where I should expect concentration to be steady
# (i.e. setpoint has not changed recently).
# N.B. Not yet testing whether concentration IS steady -- that's the next step downstream.
# Wrinkles: setpoints are logged at a lower frequency than concentrations, and logging intervals
# for both are just irregular enough to be troublesome.
# Generate sample data:
# running log of gas concentrations, recorded approximately every second
concdata = data.frame(
time = as.POSIXct((1:50) + rnorm(25, mean= 0, sd=0.1), origin="2014-07-07"),
@infotroph
infotroph / gist:448abd19357a0418e7ad
Last active August 29, 2015 14:07
pointer-like variable updating between classes?
I have a parser inherited from someone else, and would rather not modify it if I don't have to.
The TL;DR on what's below: "Do I have to modify it?"
To pick the character encoding of its input, the parser is capable of either taking a
character encoding argument or of sniffing the encoding itself. Its designated initializer
method takes a pointer to an encoding:
-(instanceType)initWithStream:(NSStream)stream usedEncoding:(NSStringEncoding *)encoding;
and I understand the idea is that I check the pointee to see whether the parser updated its value when
@infotroph
infotroph / gist:28bd34eabcaee9e600b8
Created December 4, 2014 15:50
Where are the decimal seconds stored?
> options(digits.secs=NULL)
> a = as.POSIXct("2014-12-04 09:18:27")
> b = as.POSIXct("2014-12-04 09:18:27.12345")
> a
[1] "2014-12-04 09:18:27 CST"
> b
[1] "2014-12-04 09:18:27 CST"
> options(digits.secs=6)
> a
[1] "2014-12-04 09:18:27 CST"
@infotroph
infotroph / histfailure.r
Last active August 29, 2015 14:21
Intermittent segfaults when geom_histogram has >128 groups
library(ggplot2)
set.seed(12345678)
sessionInfo()
# Loading required package: methods
# R version 3.2.0 Patched (2015-05-13 r68364)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)
# Running under: OS X 10.8.5 (Mountain Lion)
@infotroph
infotroph / read.longline.R
Last active August 29, 2015 14:22
filter short lines while reading CSV
# My input files have short header lines, then CSV data, then short footer lines.
# I'm currently trimming the short lines with an external call to sed,
# but I want a pure-R solution for portability.
# This version works nicely on small examples but gets very slow on large files,
# because append() grows the list, triggering a memory reallocation, for every line.
# Suggestions for speed improvement requested.
read.longline = function(file){
f = file(file, "r")
@infotroph
infotroph / readtime.r
Last active August 29, 2015 14:22
Surprisingly large speedup from pasting lines together
# Context: I have untidy CSVs that need some junk lines filtered out before they're even grid-shaped.
# I currently do the filtering with an external sed call,
# but wanted something that would work on any OS.
# In https://gist.github.com/infotroph/dd0faa5fd24bb78b4ff6
# I asked how to do the filtering from within R,
# and settled on readLines -> filter -> send filtered lines back to read.csv.
# This script doesn't filter anything,
# it just tests different ways of passing lines back into read.csv afterwards:
# Have a set of Make rules that produce some outputs I usually want to keep,
# and some cruft I only want when debugging.
# Want cruft removed at the end of every successful build,
# and outputs AND cruft removed on $(make clean).
# This version appears to do all these things, but I welcome more feedback if something looks wrong.
OUTPUTS = \
# bunch of compiled end products here