Skip to content

Instantly share code, notes, and snippets.

@drewlanenga
drewlanenga / bash.R
Created February 1, 2016 06:24
bash in R
set_var <- function(k, v) {
.Internal(Sys.setenv(k, v))
}
# currently just imports env vars
load_bash <- function(bash_file) {
rl <- readLines(bash_file)
# just find statements that begin with `export`
raw.exports <- grep("^export", rl)
exports <- gsub("\"", "", gsub("^export ", "", rl[raw.exports]))
@drewlanenga
drewlanenga / eurodist.R
Last active August 29, 2015 14:25
R Example: Distance Between European Cities using Multidimensional Scaling
require(graphics)
loc <- cmdscale(eurodist)
x <- loc[, 1]
y <- -loc[, 2] # reflect so North is at the top
## note asp = 1, to ensure Euclidean distances are represented correctly
plot(x, y, type = "n", xlab = "", ylab = "", asp = 1, axes = FALSE, main = "cmdscale(eurodist)")
text(x, y, rownames(loc), cex = 0.6)
@drewlanenga
drewlanenga / stdout
Created June 25, 2014 17:14
Shuffle OOM
14/06/25 16:40:07 INFO mapreduce.Job: map 97% reduce 0%
14/06/25 16:40:24 INFO mapreduce.Job: map 97% reduce 3%
14/06/25 16:40:27 INFO mapreduce.Job: map 97% reduce 5%
14/06/25 16:40:27 INFO mapreduce.Job: Task Id : attempt_1402677141678_0008_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
@drewlanenga
drewlanenga / surprise.py
Last active August 29, 2015 14:02
surprise values
def surprise( mapcount ):
"""
calculate the surprise value for a given mapcount.
basically if the more uneven the distribution of values,
the higher the surprise value.
for example, a good field to use for a coverage score
might have a surprise value less than 0.5 or 0.6.
"""
values = mapcount.values()
@drewlanenga
drewlanenga / tolleyshuffle.R
Created April 29, 2014 22:24
Regression technique with correlated errors, based on Cochrane-Orcutt (http://en.wikipedia.org/wiki/Cochrane%E2%80%93Orcutt_estimation)
## Regression with correlated errors
tolley.shuffle <- function(model,k=1e-5){
if(class(model)!="lm")
stop("Object 'model' must be of class 'lm'.")
uncorrelate <- function(model) {
r <- residuals(model)
n <- length(r)
y <- model$model[,1]
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: WritableName can't load class: org.apache.hadoop.hbase.io.ImmutableBytesWritable
at org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:2030)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1960)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
at org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:67)
at org.apache.hadoop.streaming.DumpTypedBytes.dumpTypedBytes(DumpTypedBytes.java:115)
@drewlanenga
drewlanenga / example.R
Last active August 29, 2015 13:56
Example RHadoop MR Task
library(rmr2)
groups <- rbinom(32, n = 50, prob = 0.4)
print(groups)
groups.dfs <- to.dfs(groups)
from.dfs(
mapreduce(
input = groups.dfs,
@drewlanenga
drewlanenga / lm.pmml.xml
Created January 7, 2014 23:48
Exploring support for [transformations in PMML](http://www.dmg.org/v4-1/Transformations.html) with Pattern. (Environment notes: Running Vagrant with Cascading SDK 2.2 -- https://github.com/Cascading/vagrant-cascading-hadoop-cluster)
<?xml version="1.0"?>
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd">
<Header copyright="Copyright (c) 2014 lanenga" description="Linear Regression Model">
<Extension name="user" value="lanenga" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.4"/>
<Timestamp>2014-01-07 15:33:34</Timestamp>
</Header>
<DataDictionary numberOfFields="4">
<DataField name="sepal_width" optype="continuous" dataType="double"/>
<DataField name="sepal_length" optype="continuous" dataType="double"/>