Skip to content

Instantly share code, notes, and snippets.

@jblomo
jblomo / mr_unique_words.py
Created May 30, 2012 22:46
Unique words steps
def steps(self):
return [self.mr(self.words_in_review, self.doc_frequency),
self.mr(reducer=self.unique_words),
self.mr(reducer=self.find_max_review)]
@jblomo
jblomo / gist:2395175
Created April 15, 2012 22:42
Some filesystem operations in Clojure
(defn- cpr!
"Recursive copy. src and dst are directories. The *contents* of src will be
copied *into* dst (just as if you included trainling slashes in the rsync
command)."
[src dst]
(when (not (.exists dst))
(.mkdir dst))
(let [root-len (count (.getPath src))
relative (fn [file]
@jblomo
jblomo / WordCount.java
Created March 23, 2012 17:55
Hadoop MR JSON example
import java.io.IOException;
import java.util.*;
import org.codehaus.jackson.map.ObjectMapper;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
@jblomo
jblomo / deploy.clj
Created February 12, 2012 04:20 — forked from sritchie/deploy.clj
backtype EMR configs
(ns backtype.lyoto.deploy
(:use clojure.tools.cli))
(comment
;; ## Uberjar Deploy
(require '[clojure.string :as s]
'[pallet.stevedore :as stevedore]
'[pallet.execute :as execute]
'[backtype.lyoto.log :as log])
@jblomo
jblomo / gist:1645841
Created January 20, 2012 06:52
python intro
In [1]: print "hello"
hello
In [2]: 1+1
Out[2]: 2
In [3]: added = 1 + 1
In [4]: print added
@jblomo
jblomo / ElephantOutputFormat.java
Created December 2, 2011 21:31
Exploring Hadoop OutputFormat
public class ElephantOutputFormat implements OutputFormat<IntWritable, ElephantRecordWritable> {
public static Logger LOG = Logger.getLogger(ElephantOutputFormat.class);
public RecordWriter<IntWritable, ElephantRecordWritable> getRecordWriter(FileSystem fs, JobConf conf, String string, Progressable progressable) throws IOException {
return new ElephantRecordWriter(conf, (Args) Utils.getObject(conf, ARGS_CONF), progressable);
}
public void checkOutputSpecs(FileSystem fs, JobConf conf) throws IOException {
Args args = (Args) Utils.getObject(conf, ARGS_CONF);
fs = Utils.getFS(args.outputDirHdfs, conf);
(defmacro printforms [& forms] `(doseq [form# (list ~@forms)] (println (list 'form#)))
(defn sha1
"Compute the SHA-1 of a File's contents and return the hex string"
[file]
(with-open [f (FileInputStream. file)]
(let [buffer (byte-array 1024)
md (MessageDigest/getInstance "SHA-1") ]
(loop [nread (.read f buffer)]
(if (pos? nread)
(do (.update md buffer 0 nread)
(recur (.read f buffer)))