Skip to content

Instantly share code, notes, and snippets.

View ceteri's full-sized avatar

Paco Nathan ceteri

View GitHub Profile
@ceteri
ceteri / Main.java
Last active October 6, 2015 19:47
Cascading for the Impatient, Part 5
public class
Main
{
public static void
main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
String stopPath = args[ 2 ];
String tfidfPath = args[ 3 ];
@ceteri
ceteri / Main.java
Created July 3, 2012 23:05
Cascading for the Impatient, Part 6
public class
Main
{
public static void
main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
String stopPath = args[ 2 ];
String tfidfPath = args[ 3 ];
@ceteri
ceteri / HTML
Created July 25, 2012 21:28
Cascading Sample Recommender
<h1>Cascading Sample Recommender</h1>
<p>The goal for this project is to create a sample application in <a href="http://www.cascading.org/">Cascading 2.0</a> which shows how to build a simple kind of <a href="http://en.wikipedia.org/wiki/Recommender_system">social recommender</a>.</p>
<h2>Build</h2>
<p>First, clone a copy of the source code from our GitHub repo at <a href="https://github.com/Cascading/SampleRecommender">https://github.com/Cascading/SampleRecommender</a></p>
<pre><code>git clone https://github.com/Cascading/SampleRecommender.git
</code></pre>
@ceteri
ceteri / log
Created September 6, 2012 22:28
CMU Workshop on Cascading plus City of Palo Alto Open Data
bash-3.2$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
bash-3.2$ hadoop -version
Warning: $HADOOP_HOME is deprecated.
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
@ceteri
ceteri / log
Created September 7, 2012 23:53
Cascading wordcount sample
bash-3.2$ cd cascading.samples/
bash-3.2$ ls
build.gradle hadoop logparser settings.gradle
build.xml loganalysis sample.build.gradle wordcount
bash-3.2$ cd wordcount/
bash-3.2$ ls
README.TXT build.gradle data src
bash-3.2$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
@ceteri
ceteri / pig.log
Created September 24, 2012 19:46
Debugging the Cascasding / Pig comparison - part 4
bash-3.2$ java -version
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode)
bash-3.2$ pig -version
Warning: $HADOOP_HOME is deprecated.
Apache Pig version 0.10.0 (r1328203)
compiled Apr 19 2012, 22:54:12
bash-3.2$ cat src/scripts/wc.pig
@ceteri
ceteri / Impatient #1
Created September 25, 2012 20:20
Cascading user list questions
bash-3.2$ gradle clean jar
:clean
:compileJava
:processResources UP-TO-DATE
:classes
:jar
BUILD SUCCESSFUL
Total time: 4.316 secs
@ceteri
ceteri / log
Created October 14, 2012 19:52
ACM DM - Python exercise
# use git to load ceteri-mapred (simplest as a ZIP)
# https://github.com/ceteri/ceteri-mapred
# cd to your ceteri-mapred download
Pacos-MacBook-Pro:ceteri-mapred ceteri$ ls
README doc graph.gephi src thresh.R
bin graph.csv msgs.tsv stopwords thresh.tsv
Pacos-MacBook-Pro:ceteri-mapred ceteri$ ls src/
map_filter.py map_parse.py map_wc.py red_filter.py red_idf.py red_wc.py util_extract.py util_gephi.py util_walk.py
Pacos-MacBook-Pro:ceteri-mapred ceteri$ head README
@ceteri
ceteri / log
Created October 14, 2012 19:46
ACM DM - Multitool exercise
# use git to load multitool (simplest as a ZIP)
# https://github.com/Cascading/cascading.multitool
# to save time, we'll skip the JAR compile/build...
# download the JAR file from:
# https://s3.amazonaws.com/ceteri-mapred/multitool.jar
# cd to your cascading.multitool download
bash-3.2$ rm -rf output
bash-3.2$ hadoop jar ./multitool.jar source=data/days.txt select=Tuesday sink=output/tuesday.txt
@ceteri
ceteri / wc.clj
Created October 16, 2012 17:24
Cascading for the Impatient, Part 2 - Word Count
; Paul Lam
; https://github.com/Quantisan/Impatient
(ns impatient.core
(:use [cascalog.api]
[cascalog.more-taps :only (hfs-delimited)])
(:require [clojure.string :as s]
[cascalog.ops :as c])
(:gen-class))