Skip to content

Instantly share code, notes, and snippets.

View ceteri's full-sized avatar

Paco Nathan ceteri

View GitHub Profile
@ceteri
ceteri / main.java
Created October 26, 2012 18:19
PMML
bash-3.2$ ls
README.md build.gradle cascading.pattern.ipr data model.log src
build cascading.pattern.iml cascading.pattern.iws dot output
bash-3.2$ more output/
classify/ measure/
bash-3.2$ more output/measure/
output/measure/ is a directory
bash-3.2$ more output/measure/part-00000
label score count
0 0 73
@ceteri
ceteri / Example3.scala
Last active December 10, 2015 02:58
Cascading for the Impatient, Part 8 -- Scalding examples
import com.twitter.scalding._
class Example3(args : Args) extends Job(args) {
Tsv(args("doc"), ('doc_id, 'text), skipHeader = true)
.read
.flatMap('text -> 'token) { text : String => text.split("[ \\[\\]\\(\\),.]") }
.mapTo('token -> 'token) { token : String => scrub(token) }
.filter('token) { token : String => token.length > 0 }
.groupBy('token) { _.size('count) }
.write(Tsv(args("wc"), writeHeader = true))
@ceteri
ceteri / Cascalog._tutorial
Last active December 10, 2015 09:58
Cascading for the Impatient, Part 9
bash-3.2$ lein repl
Listening for transport dt_socket at address: 51539
nREPL server started on port 51542
REPL-y 0.1.0-beta10
Clojure 1.4.0
Exit: Control+D or (exit) or (quit)
Commands: (user/help)
Docs: (doc function-name-here)
(find-doc "part-of-name-here")
Source: (source function-name-here)
@ceteri
ceteri / Main.java
Created January 5, 2013 05:18
COUNT(DISTINCT c) in Cascading, for Mikhail Gavryuchkov
package example;
import java.util.Properties;
import cascading.flow.Flow;
import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.pipe.CoGroup;
import cascading.pipe.Pipe;
import cascading.pipe.assembly.CountBy;
@ceteri
ceteri / Pattern test.log
Last active December 11, 2015 10:39
Pattern machine learning library for Cascading
bash-3.2$ pwd
/Users/ceteri/src/concur/pattern
bash-3.2$ java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.3
@ceteri
ceteri / Cascalog.log
Last active December 11, 2015 18:28
City of Palo Alto Open Data app in Cascalog
bash-3.2$ lein version
Leiningen 2.0.0-preview10 on Java 1.6.0_43 Java HotSpot(TM) 64-Bit Server VM
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.3
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192
Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
bash-3.2$ lein clean
@ceteri
ceteri / Main.java
Last active December 12, 2015 08:59
ANSI SQL in Cascading
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
public class JdbcExample
{
public static void main( String[] args ) throws Exception
{
@ceteri
ceteri / cascalog_build.log
Last active December 14, 2015 22:29
Cascalog testing with Cascading 2.2-wip
bash-3.2$ lein do sub install, deps, compile, repl
Could not find artifact lein-newnew:lein-newnew:pom:0.3.5 in central (http://repo1.maven.org/maven2)
Retrieving lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.pom (3k)
from https://clojars.org/repo/
Could not find artifact stencil:stencil:pom:0.3.0 in central (http://repo1.maven.org/maven2)
Retrieving stencil/stencil/0.3.0/stencil-0.3.0.pom (3k)
from https://clojars.org/repo/
Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.pom (5k)
from http://repo1.maven.org/maven2/
Retrieving org/sonatype/oss/oss-parent/5/oss-parent-5.pom (4k)
@ceteri
ceteri / build.log
Created April 8, 2013 17:58
Pattern build
bash-3.2$ git show | head
commit d78b48fffff32898a9f76e94923d45a84d7e330e
Author: Paco Nathan <ceteri@gmail.com>
Date: Sat Mar 16 19:11:46 2013 -0700
fixed cmd line opts to allow for a different label field, for the confusion matrix calculation
diff --git a/README.md b/README.md
index ed10626..2e8996e 100644
--- a/README.md
@ceteri
ceteri / kmeans.py
Last active December 25, 2015 15:49
scikit-learn examples
print(__doc__)
from time import time
import numpy as np
import pylab as pl
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA