Skip to content

Instantly share code, notes, and snippets.

View ceteri's full-sized avatar

Paco Nathan ceteri

View GitHub Profile
@ceteri
ceteri / HTML
Created July 25, 2012 21:28
Cascading Sample Recommender
<h1>Cascading Sample Recommender</h1>
<p>The goal for this project is to create a sample application in <a href="http://www.cascading.org/">Cascading 2.0</a> which shows how to build a simple kind of <a href="http://en.wikipedia.org/wiki/Recommender_system">social recommender</a>.</p>
<h2>Build</h2>
<p>First, clone a copy of the source code from our GitHub repo at <a href="https://github.com/Cascading/SampleRecommender">https://github.com/Cascading/SampleRecommender</a></p>
<pre><code>git clone https://github.com/Cascading/SampleRecommender.git
</code></pre>
@ceteri
ceteri / log
Created September 6, 2012 22:28
CMU Workshop on Cascading plus City of Palo Alto Open Data
bash-3.2$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
bash-3.2$ hadoop -version
Warning: $HADOOP_HOME is deprecated.
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
@ceteri
ceteri / log
Created September 7, 2012 23:53
Cascading wordcount sample
bash-3.2$ cd cascading.samples/
bash-3.2$ ls
build.gradle hadoop logparser settings.gradle
build.xml loganalysis sample.build.gradle wordcount
bash-3.2$ cd wordcount/
bash-3.2$ ls
README.TXT build.gradle data src
bash-3.2$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
@ceteri
ceteri / pig.log
Created September 24, 2012 19:46
Debugging the Cascasding / Pig comparison - part 4
bash-3.2$ java -version
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)
bash-3.2$ pig -version
Warning: $HADOOP_HOME is deprecated.
Apache Pig version 0.10.0 (r1328203)
compiled Apr 19 2012, 22:54:12
bash-3.2$ cat src/scripts/wc.pig
@ceteri
ceteri / Impatient #1
Created September 25, 2012 20:20
Cascading user list questions
bash-3.2$ gradle clean jar
:clean
:compileJava
:processResources UP-TO-DATE
:classes
:jar
BUILD SUCCESSFUL
Total time: 4.316 secs
@ceteri
ceteri / log
Created October 14, 2012 19:46
ACM DM - Multitool exercise
# use git to load multitool (simplest as a ZIP)
# https://github.com/Cascading/cascading.multitool
# to save time, we'll skip the JAR compile/build...
# download the JAR file from:
# https://s3.amazonaws.com/ceteri-mapred/multitool.jar
# cd to your cascading.multitool download
bash-3.2$ rm -rf output
bash-3.2$ hadoop jar ./multitool.jar source=data/days.txt select=Tuesday sink=output/tuesday.txt
@ceteri
ceteri / log
Created October 14, 2012 19:52
ACM DM - Python exercise
# use git to load ceteri-mapred (simplest as a ZIP)
# https://github.com/ceteri/ceteri-mapred
# cd to your ceteri-mapred download
Pacos-MacBook-Pro:ceteri-mapred ceteri$ ls
README doc graph.gephi src thresh.R
bin graph.csv msgs.tsv stopwords thresh.tsv
Pacos-MacBook-Pro:ceteri-mapred ceteri$ ls src/
map_filter.py map_parse.py map_wc.py red_filter.py red_idf.py red_wc.py util_extract.py util_gephi.py util_walk.py
Pacos-MacBook-Pro:ceteri-mapred ceteri$ head README
@ceteri
ceteri / wc.clj
Created October 16, 2012 17:24
Cascading for the Impatient, Part 2 - Word Count
; Paul Lam
; https://github.com/Quantisan/Impatient
(ns impatient.core
(:use [cascalog.api]
[cascalog.more-taps :only (hfs-delimited)])
(:require [clojure.string :as s]
[cascalog.ops :as c])
(:gen-class))
@ceteri
ceteri / main.java
Created October 26, 2012 18:19
PMML
bash-3.2$ ls
README.md build.gradle cascading.pattern.ipr data model.log src
build cascading.pattern.iml cascading.pattern.iws dot output
bash-3.2$ more output/
classify/ measure/
bash-3.2$ more output/measure/
output/measure/ is a directory
bash-3.2$ more output/measure/part-00000
label score count
0 0 73
@ceteri
ceteri / Example3.scala
Last active December 10, 2015 02:58
Cascading for the Impatient, Part 8 -- Scalding examples
import com.twitter.scalding._
class Example3(args : Args) extends Job(args) {
Tsv(args("doc"), ('doc_id, 'text), skipHeader = true)
.read
.flatMap('text -> 'token) { text : String => text.split("[ \\[\\]\\(\\),.]") }
.mapTo('token -> 'token) { token : String => scrub(token) }
.filter('token) { token : String => token.length > 0 }
.groupBy('token) { _.size('count) }
.write(Tsv(args("wc"), writeHeader = true))