Skip to content

Instantly share code, notes, and snippets.

View Quantisan's full-sized avatar

Paul Lam Quantisan

View GitHub Profile
@Quantisan
Quantisan / map_counter.clj
Created March 7, 2012 15:37
equal-weight mapped counter
(comment
;; "Usage Example"
(facts
(attrib-model [["a" "a"]
["b"]
["b" "c" "a"]
["b"]]
equal-weights) => {"a" 4/3, "b" 7/3, "c" 1/3})
)
@Quantisan
Quantisan / output
Created August 23, 2012 20:21
Impatient part 1
$ more output/rain/part-00000
doc_id text
doc01 A rain shadow is a dry area on the lee back side of a mountainous area.
doc02 This sinking, dry air produces a rain shadow, or area in the lee of a mountain with less rain and cloudcover.
doc03 A rain shadow is an area of dry land that lies on the leeward (or downwind) side of a mountain.
doc04 This is known as the rain shadow effect and is the primary cause of leeward deserts of mountain ranges, such as California's Death Valley.
doc05 Two Women. Secrets. A Broken Land. [DVD Australia]
@Quantisan
Quantisan / output
Created August 24, 2012 20:55
Impatient part 2
$ cat output/rain/part-00000
A 3
Australia 1
Broken 1
California's 1
DVD 1
Death 1
Land 1
Secrets 1
This 2
@Quantisan
Quantisan / run log
Created August 28, 2012 17:58
Impatient part 6
aul-Lams-computer:part6 paullam$ hadoop jar target/impatient.jar data/rain.txt output/wc data/en.stop output/tfidf output/trap output/check
2012-08-28 18:52:15.457 java[16966:1903] Unable to load realm info from SCDynamicStore
12/08/28 18:52:16 INFO util.HadoopUtil: using default application jar, may cause class not found exceptions on the cluster
12/08/28 18:52:16 INFO planner.HadoopPlanner: using application jar: /Users/paullam/Dropbox/Projects/Impatient/part6/target/impatient.jar
12/08/28 18:52:16 INFO property.AppProps: using app.id: D5424D7B027EC9418FCADE8F3552429B
12/08/28 18:52:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/08/28 18:52:16 WARN snappy.LoadSnappy: Snappy native library not loaded
12/08/28 18:52:16 INFO mapred.FileInputFormat: Total input paths to process : 1
12/08/28 18:52:16 INFO mapred.FileInputFormat: Total input paths to process : 1
12/08/28 18:52:16 INFO util.Version: Concurrent, Inc - Cascading
@Quantisan
Quantisan / output
Created August 29, 2012 21:28
Impatient part 4
$ cat output/wc/part-00000
air 1
area 4
australia 1
broken 1
california's 1
cause 1
cloudcover 1
death 1
deserts 1
# read in data
c <- read.csv("data/cartier_sample_likes.csv", header=F)
names(c) <- c("user.id", "page.id")
s <- read.csv("data/swarovski_sample_likes.csv", header=F)
names(s) <- c("user.id", "page.id")
p <- read.csv("data/page_labels.csv", header=F)
names(p) <- c("page.id", "followers", "name")
require(stringr)
p$name <- as.factor(str_trim(as.character(p$name))) ## trim whitespace
@Quantisan
Quantisan / run log
Created October 6, 2012 14:04
Impatient part 5
$ hadoop jar ./target/impatient.jar data/rain.txt output/wc data/en.stop output/tfidf
2012-10-06 15:00:25.269 java[1097:1903] Unable to load realm info from SCDynamicStore
12/10/06 15:00:25 INFO util.HadoopUtil: resolving application jar from found main method on: impatient.core
12/10/06 15:00:25 INFO planner.HadoopPlanner: using application jar: /Users/paullam/Dropbox/Projects/Impatient/part5/./target/impatient.jar
12/10/06 15:00:25 INFO property.AppProps: using app.id: 63CBE2FEBFE8177789403D9EA7C81366
12/10/06 15:00:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/10/06 15:00:25 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/06 15:00:25 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/06 15:00:25 INFO mapred.FileInputFormat: Total input paths to process : 1
@Quantisan
Quantisan / batch-import
Created October 15, 2012 10:21
neo4j batch import benchmark
Paul-Lams-computer:batch-import paullam$ java TestDataGenerator
Creating 7500000 and 191215478 Relationships took 146 seconds.
Paul-Lams-computer:batch-import paullam$ head nodes.csv
Node Rels Property Counter:int
0 12 TEST 0
1 14 TEST 1
2 25 TEST 2
3 28 TEST 3
4 4 TEST 4
5 7 TEST 5
@Quantisan
Quantisan / variety-output
Created October 18, 2012 17:34
mongo schema analyzer
$ mongo twitter --eval "var collection = 'energy'" variety.js
MongoDB shell version: 2.2.0
connecting to: twitter
Variety: A MongoDB Schema Analyzer
Version 1.2.1, released 29 July 2012
Using limit of 582
Using maxDepth of 99
creating results collection: energyKeys
removing leaf arrays in results collection, and getting percentages
{ "_id" : { "key" : "_id" }, "value" : { "type" : "ObjectId" }, "totalOccurrences" : 582, "percentContaining" : 100 }
@Quantisan
Quantisan / coin.R
Last active December 10, 2015 22:29
illustrating statistical type I and II errors
test.flips <- function(N, A.prop=0.5, B.prop=0.5, attr="p.value") {
heads.A <- rbinom(1, N, A.prop)
heads.B <- rbinom(1, N, B.prop)
test <- prop.test(c(heads.A, heads.B), n=c(N, N), alternative="two.sided")
return(as.numeric(test[attr]))
}
## vary number of flips
N <- seq(1, 1001, by=10)