Skip to content

Instantly share code, notes, and snippets.

View ceteri's full-sized avatar

Paco Nathan ceteri

View GitHub Profile
@ceteri
ceteri / log
Created September 6, 2012 22:28
CMU Workshop on Cascading plus City of Palo Alto Open Data
bash-3.2$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
bash-3.2$ hadoop -version
Warning: $HADOOP_HOME is deprecated.
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
@ceteri
ceteri / Example3.scala
Last active December 10, 2015 02:58
Cascading for the Impatient, Part 8 -- Scalding examples
import com.twitter.scalding._
class Example3(args : Args) extends Job(args) {
Tsv(args("doc"), ('doc_id, 'text), skipHeader = true)
.read
.flatMap('text -> 'token) { text : String => text.split("[ \\[\\]\\(\\),.]") }
.mapTo('token -> 'token) { token : String => scrub(token) }
.filter('token) { token : String => token.length > 0 }
.groupBy('token) { _.size('count) }
.write(Tsv(args("wc"), writeHeader = true))
@ceteri
ceteri / Cascalog._tutorial
Last active December 10, 2015 09:58
Cascading for the Impatient, Part 9
bash-3.2$ lein repl
Listening for transport dt_socket at address: 51539
nREPL server started on port 51542
REPL-y 0.1.0-beta10
Clojure 1.4.0
Exit: Control+D or (exit) or (quit)
Commands: (user/help)
Docs: (doc function-name-here)
(find-doc "part-of-name-here")
Source: (source function-name-here)
@ceteri
ceteri / Pattern test.log
Last active December 11, 2015 10:39
Pattern machine learning library for Cascading
bash-3.2$ pwd
/Users/ceteri/src/concur/pattern
bash-3.2$ java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.3
@ceteri
ceteri / Cascalog.log
Last active December 11, 2015 18:28
City of Palo Alto Open Data app in Cascalog
bash-3.2$ lein version
Leiningen 2.0.0-preview10 on Java 1.6.0_43 Java HotSpot(TM) 64-Bit Server VM
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.3
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192
Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
bash-3.2$ lein clean
@ceteri
ceteri / cascalog_build.log
Last active December 14, 2015 22:29
Cascalog testing with Cascading 2.2-wip
bash-3.2$ lein do sub install, deps, compile, repl
Could not find artifact lein-newnew:lein-newnew:pom:0.3.5 in central (http://repo1.maven.org/maven2)
Retrieving lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.pom (3k)
from https://clojars.org/repo/
Could not find artifact stencil:stencil:pom:0.3.0 in central (http://repo1.maven.org/maven2)
Retrieving stencil/stencil/0.3.0/stencil-0.3.0.pom (3k)
from https://clojars.org/repo/
Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.pom (5k)
from http://repo1.maven.org/maven2/
Retrieving org/sonatype/oss/oss-parent/5/oss-parent-5.pom (4k)
@drewlanenga
drewlanenga / lm.pmml.xml
Created January 7, 2014 23:48
Exploring support for [transformations in PMML](http://www.dmg.org/v4-1/Transformations.html) with Pattern. (Environment notes: Running Vagrant with Cascading SDK 2.2 -- https://github.com/Cascading/vagrant-cascading-hadoop-cluster)
<?xml version="1.0"?>
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd">
<Header copyright="Copyright (c) 2014 lanenga" description="Linear Regression Model">
<Extension name="user" value="lanenga" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.4"/>
<Timestamp>2014-01-07 15:33:34</Timestamp>
</Header>
<DataDictionary numberOfFields="4">
<DataField name="sepal_width" optype="continuous" dataType="double"/>
<DataField name="sepal_length" optype="continuous" dataType="double"/>
@ccsevers
ccsevers / AvroReadExample.java
Created October 29, 2012 18:27
cascading.avro wordcount example
package cascading.avro.examples;
import java.util.Properties;
import cascading.flow.Flow;
import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.operation.aggregator.Count;
import cascading.operation.regex.RegexFilter;
import cascading.operation.regex.RegexSplitGenerator;
@porterjamesj
porterjamesj / hello_mesos.py
Last active March 6, 2018 20:43
the tiniest mesos scheduler
import logging
import uuid
import time
from mesos.interface import Scheduler
from mesos.native import MesosSchedulerDriver
from mesos.interface import mesos_pb2
logging.basicConfig(level=logging.INFO)
@jakevdp
jakevdp / Jupyter_vs_Mathematica.ipynb
Created April 8, 2018 05:01
Jupyter vs Mathematica Google Trends
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.