Skip to content

Instantly share code, notes, and snippets.

View ceteri's full-sized avatar

Paco Nathan ceteri

View GitHub Profile
@ceteri
ceteri / Pattern test.log
Last active December 11, 2015 10:39
Pattern machine learning library for Cascading
bash-3.2$ pwd
/Users/ceteri/src/concur/pattern
bash-3.2$ java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.3
@ceteri
ceteri / Cascalog._tutorial
Last active December 10, 2015 09:58
Cascading for the Impatient, Part 9
bash-3.2$ lein repl
Listening for transport dt_socket at address: 51539
nREPL server started on port 51542
REPL-y 0.1.0-beta10
Clojure 1.4.0
Exit: Control+D or (exit) or (quit)
Commands: (user/help)
Docs: (doc function-name-here)
(find-doc "part-of-name-here")
Source: (source function-name-here)
@ceteri
ceteri / Example3.scala
Last active December 10, 2015 02:58
Cascading for the Impatient, Part 8 -- Scalding examples
import com.twitter.scalding._
class Example3(args : Args) extends Job(args) {
Tsv(args("doc"), ('doc_id, 'text), skipHeader = true)
.read
.flatMap('text -> 'token) { text : String => text.split("[ \\[\\]\\(\\),.]") }
.mapTo('token -> 'token) { token : String => scrub(token) }
.filter('token) { token : String => token.length > 0 }
.groupBy('token) { _.size('count) }
.write(Tsv(args("wc"), writeHeader = true))
@ccsevers
ccsevers / AvroReadExample.java
Created October 29, 2012 18:27
cascading.avro wordcount example
package cascading.avro.examples;
import java.util.Properties;
import cascading.flow.Flow;
import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.operation.aggregator.Count;
import cascading.operation.regex.RegexFilter;
import cascading.operation.regex.RegexSplitGenerator;
@ceteri
ceteri / log
Created September 6, 2012 22:28
CMU Workshop on Cascading plus City of Palo Alto Open Data
bash-3.2$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
bash-3.2$ hadoop -version
Warning: $HADOOP_HOME is deprecated.
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
@ceteri
ceteri / Main.java
Created July 3, 2012 23:05
Cascading for the Impatient, Part 6
public class
Main
{
public static void
main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
String stopPath = args[ 2 ];
String tfidfPath = args[ 3 ];
@ceteri
ceteri / Main.java
Last active October 6, 2015 19:47
Cascading for the Impatient, Part 5
public class
Main
{
public static void
main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
String stopPath = args[ 2 ];
String tfidfPath = args[ 3 ];
@ceteri
ceteri / Main.java
Last active October 6, 2015 19:47
Cascading for the Impatient, Part 4
public class
Main
{
public static void
main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
String stopPath = args[ 2 ];
@ceteri
ceteri / Main.java
Created June 30, 2012 01:11
Cascading for the Impatient, Part 3
public class
Main
{
public static void
main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
Properties properties = new Properties();
@ceteri
ceteri / Main.java
Created June 29, 2012 19:56
Cascading for the Impatient, Part 2
public class
Main
{
public static void
main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
Properties properties = new Properties();