Skip to content

Instantly share code, notes, and snippets.

@datalove
datalove / xpath_selection.md
Last active August 26, 2015 00:03
XPath to select parent where children meet certain criteria

For the code below, if we want to select records have a field with colour = Red and length > 30...

<root>
<record id="1">
  <field id="colour" value="Red" />
  <field id="size"   value="Small" />
  <field id="length" value=25 />
</record>
@datalove
datalove / kaggle.aquire.scores.20140611
Last active August 29, 2015 14:02
Kaggle Acquire Competition - Distribution of Scores 11 June 2014
library(XML)
library(ggplot2)
url <- "http://www.kaggle.com/c/acquire-valued-shoppers-challenge/leaderboard"
tree <- htmlTreeParse(url)
tbl <- readHTMLTable(pagetree, stringsAsFactors = FALSE)[[1]]
colnames(tbl) <- gsub("[^a-zA-Z0-9#]","", colnames(tbl))
tbl$Score <- as.numeric(tbl$Score)
@datalove
datalove / odds.p.from.ci
Last active August 29, 2015 14:03
Calculate P-value for an odds ratio using CIs
# calculation as per:
# http://www.bmj.com/content/343/bmj.d2304
get_odds_p <- function(est, cil, ciu) {
se <- (log(ciu) - log(cil))/(2*1.96)
z <- abs(log(est)/se)
p <- exp(-0.717*z - 0.416*z^2)
p
}
@datalove
datalove / evaluate_a_string.r
Last active August 29, 2015 14:05
How to get R to evaluate arbitrary code in a string
rm(list=ls())
rcode <- "x <- 1+1; y <- 2+2"
# running `eval` evaluates `rcode` into the parent environment
eval(parse(text = rcode))
print(x)
print(y)
# Use autodetected proxy settings
setInternet2(TRUE)
# Get the SpotfireSPK package from the Spotfire Stats Server
install.packages("SpotfireSPK", repos = "http://MySpotfireServer:8080/SplusServer/update/TERR")
# Get a package you want to deploy from CRAN
install.packages("nortest", repos = "http://cran.us.r-project.org")
# Create a Debian Control File
@datalove
datalove / TERR_Expression_Function_Handling_Input_Columns.r
Last active August 29, 2015 14:08
TERR Script to be used in a Spotfire 'Expression Function'0
#######################################################
#
# Expression Functions in Spotfire can take arbitrarily
# many columns as input. Columns will be passed to TERR
# in order as 'input1', 'input2', etc.
#
# This shows how to capture an arbitrary number of
# columns and to put them into a data frame.
#
#######################################################
@datalove
datalove / TERR_Expression_Function_Mahalanobis_Distance.r
Last active August 29, 2015 14:08
Finds the Mahalanobis Distance for a set of columns
###################################################################
# Takes an arbitrarily long list of input columns and returns a
# boolean indicating whether or not each row is an outlier.
###################################################################
# create vector of inputs
inputs <- grep("^input[0-9]+$",ls(), value = TRUE)
# capture columns as a matrix
x <- sapply(inputs, function(y) {eval(parse(text = y))})
@datalove
datalove / TERR_Expression_Function_Mahalanobis_Outlier.r
Last active August 29, 2015 14:08
Find multivariate outliers using Mahalanobis Distances
########################################################
# Takes an arbitrarily long list of input columns and
# returns a boolean indicating whether or not each row
# is an outlier.
#
# The function uses the critical value for Mahalanobis
# Distance calculated from an upper tailed ChiSq
# distribution with p=0.001.
########################################################
@datalove
datalove / R_SQL_Decode.r
Last active August 29, 2015 14:09
SQL-like decode statement for R
decode <- function(x, ...) {
odds <- function(x) { unlist(x[1:length(x) %% 2 == 1][1:floor(length(x)/2)]) }
even <- function(x) { unlist(x[1:length(x) %% 2 == 0]) }
last <- function(x) { unlist(if(length(x) %% 2 == 1) tail(x,1)) }
interpret_args <- function(x) { if(is.call(x)) {eval(x)} else if(is.name(x)) {as.character(x)} else {x} }
args <- eval(substitute(alist(...)))
args <- lapply(args, interpret_args)