Chris Black infotroph

## gist:9479509
### Keybase proof

I hereby claim:

  * I am infotroph on github.
  * I am infotroph (https://keybase.io/infotroph) on keybase.
  * I have a public key whose fingerprint is E056 C0A5 5FB2 EA2C 897C  591A 19B3 5E7D 101C 0BEF

To claim this, I am signing this object:

## gist:9751447
> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
   user  system elapsed
 35.127   5.569  40.398
> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
   user  system elapsed
 35.559   5.482  40.728
> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
   user  system elapsed
 35.666   4.975  40.366
> system.time({a=ddply(raw, .(Img), strip.tracing.dups)})

## gist:58608607659a5ce7a989
> f <- function() { print('FALSE'); FALSE }

# infix logical operators short-circuit as expected
> TRUE || f()
[1] TRUE
> FALSE || f()
[1] "FALSE"
[1] FALSE
> TRUE && f()
[1] "FALSE"

## findstable.R
# Goal: Identify the rows of a time series where I should expect concentration to be steady
#	(i.e. setpoint has not changed recently).
# N.B. Not yet testing whether concentration IS steady -- that's the next step downstream.
# Wrinkles: setpoints are logged at a lower frequency than concentrations, and logging intervals
# 	for both are just irregular enough to be troublesome.

# Generate sample data:
# running log of gas concentrations, recorded approximately every second
concdata = data.frame(
	time = as.POSIXct((1:50) + rnorm(25, mean= 0, sd=0.1), origin="2014-07-07"),

## gist:448abd19357a0418e7ad
I have a parser inherited from someone else, and would rather not modify it if I don't have to.
The TL;DR on what's below: "Do I have to modify it?"

To pick the character encoding of its input, the parser is capable of either taking a
character encoding argument or of sniffing the encoding itself. Its designated initializer
method takes a pointer to an encoding:

-(instanceType)initWithStream:(NSStream)stream usedEncoding:(NSStringEncoding *)encoding;

and I understand the idea is that I check the pointee to see whether the parser updated its value when

## gist:28bd34eabcaee9e600b8
> options(digits.secs=NULL)
> a = as.POSIXct("2014-12-04 09:18:27")
> b = as.POSIXct("2014-12-04 09:18:27.12345")
> a
[1] "2014-12-04 09:18:27 CST"
> b
[1] "2014-12-04 09:18:27 CST"
> options(digits.secs=6)
> a
[1] "2014-12-04 09:18:27 CST"

## histfailure.r
library(ggplot2)

set.seed(12345678)

sessionInfo()
# Loading required package: methods
# R version 3.2.0 Patched (2015-05-13 r68364)
# Platform: x86_64-apple-darwin10.8.0 (64-bit)
# Running under: OS X 10.8.5 (Mountain Lion)

## read.longline.R
# My input files have short header lines, then CSV data, then short footer lines.
# I'm currently trimming the short lines with an external call to sed,
# but I want a pure-R solution for portability.

# This version works nicely on small examples but gets very slow on large files,
# because append() grows the list, triggering a memory reallocation, for every line.
# Suggestions for speed improvement requested.

read.longline = function(file){
	f = file(file, "r")

## readtime.r
# Context: I have untidy CSVs that need some junk lines filtered out before they're even grid-shaped.
# I currently do the filtering with an external sed call,
# but wanted something that would work on any OS.

# In https://gist.github.com/infotroph/dd0faa5fd24bb78b4ff6
# I asked how to do the filtering from within R,
# and settled on readLines -> filter -> send filtered lines back to read.csv.

# This script doesn't filter anything,
# it just tests different ways of passing lines back into read.csv afterwards:

## gist:2f53db2f610730abe27a

# Have a set of Make rules that produce some outputs I usually want to keep,
# and some cruft I only want when debugging.
# Want cruft removed at the end of every successful build,
# and outputs AND cruft removed on $(make clean).

# This version appears to do all these things, but I welcome more feedback if something looks wrong.

OUTPUTS = \
	# bunch of compiled end products here
	### Keybase proof

	I hereby claim:

	* I am infotroph on github.
	* I am infotroph (https://keybase.io/infotroph) on keybase.
	* I have a public key whose fingerprint is E056 C0A5 5FB2 EA2C 897C 591A 19B3 5E7D 101C 0BEF

	To claim this, I am signing this object:
	> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
	user system elapsed
	35.127 5.569 40.398
	> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
	user system elapsed
	35.559 5.482 40.728
	> system.time({a=by(raw, raw$Img, strip.tracing.dups); b=do.call(rbind,a)})
	user system elapsed
	35.666 4.975 40.366
	> system.time({a=ddply(raw, .(Img), strip.tracing.dups)})
	> f <- function() { print('FALSE'); FALSE }

	# infix logical operators short-circuit as expected
	> TRUE \|\| f()
	[1] TRUE
	> FALSE \|\| f()
	[1] "FALSE"
	[1] FALSE
	> TRUE && f()
	[1] "FALSE"
	# Goal: Identify the rows of a time series where I should expect concentration to be steady
	# (i.e. setpoint has not changed recently).
	# N.B. Not yet testing whether concentration IS steady -- that's the next step downstream.
	# Wrinkles: setpoints are logged at a lower frequency than concentrations, and logging intervals
	# for both are just irregular enough to be troublesome.

	# Generate sample data:
	# running log of gas concentrations, recorded approximately every second
	concdata = data.frame(
	time = as.POSIXct((1:50) + rnorm(25, mean= 0, sd=0.1), origin="2014-07-07"),
	I have a parser inherited from someone else, and would rather not modify it if I don't have to.
	The TL;DR on what's below: "Do I have to modify it?"

	To pick the character encoding of its input, the parser is capable of either taking a
	character encoding argument or of sniffing the encoding itself. Its designated initializer
	method takes a pointer to an encoding:

	-(instanceType)initWithStream:(NSStream)stream usedEncoding:(NSStringEncoding *)encoding;

	and I understand the idea is that I check the pointee to see whether the parser updated its value when
	> options(digits.secs=NULL)
	> a = as.POSIXct("2014-12-04 09:18:27")
	> b = as.POSIXct("2014-12-04 09:18:27.12345")
	> a
	[1] "2014-12-04 09:18:27 CST"
	> b
	[1] "2014-12-04 09:18:27 CST"
	> options(digits.secs=6)
	> a
	[1] "2014-12-04 09:18:27 CST"
	library(ggplot2)

	set.seed(12345678)

	sessionInfo()
	# Loading required package: methods
	# R version 3.2.0 Patched (2015-05-13 r68364)
	# Platform: x86_64-apple-darwin10.8.0 (64-bit)
	# Running under: OS X 10.8.5 (Mountain Lion)
	# My input files have short header lines, then CSV data, then short footer lines.
	# I'm currently trimming the short lines with an external call to sed,
	# but I want a pure-R solution for portability.

	# This version works nicely on small examples but gets very slow on large files,
	# because append() grows the list, triggering a memory reallocation, for every line.
	# Suggestions for speed improvement requested.

	read.longline = function(file){
	f = file(file, "r")
	# Context: I have untidy CSVs that need some junk lines filtered out before they're even grid-shaped.
	# I currently do the filtering with an external sed call,
	# but wanted something that would work on any OS.

	# In https://gist.github.com/infotroph/dd0faa5fd24bb78b4ff6
	# I asked how to do the filtering from within R,
	# and settled on readLines -> filter -> send filtered lines back to read.csv.

	# This script doesn't filter anything,
	# it just tests different ways of passing lines back into read.csv afterwards:

	# Have a set of Make rules that produce some outputs I usually want to keep,
	# and some cruft I only want when debugging.
	# Want cruft removed at the end of every successful build,
	# and outputs AND cruft removed on $(make clean).

	# This version appears to do all these things, but I welcome more feedback if something looks wrong.

	OUTPUTS = \
	# bunch of compiled end products here