This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package forma; | |
import forma.WholeFileInputFormat; | |
import cascading.scheme.Scheme; | |
import cascading.tap.Tap; | |
import cascading.tuple.Fields; | |
import cascading.tuple.Tuple; | |
import cascading.tuple.TupleEntry; | |
import java.io.IOException; | |
import org.apache.hadoop.mapred.JobConf; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn whole-file | |
"Custom scheme for dealing with entire files." | |
[field-name] | |
(WholeFile. (w/fields field-name))) | |
(defn hfs-wholefile | |
"Creates a tap on HDFS using the wholefile format. Guaranteed not | |
to chop files up! Required for unsupported compression formats like HDF." | |
[path] | |
(w/hfs-tap (whole-file ["file"]) path)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!-- this one fails. --> | |
<?xml version="1.0"?> | |
<!-- core-site.xml --> | |
<configuration> | |
<property> | |
<name>io.serializations</name> | |
<value>cascading.tuple.hadoop.BytesSerialization</value> | |
</property> | |
</configuration> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(def modis-blanks | |
(let [offsets [14 11 9 6 4 2 1 0 0 0 0 0 1 2 4 6 9 11 14]] | |
(union (set (for [x (range 18) y (range (offsets x))] [x y])) | |
(set (for [x (range 18) y (range (- 36 (offsets x)) 36)] [x y]))))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; We define simpler functions first, so we know here that this'll get used below... | |
(defn average | |
"Takes the average of all numbers in the supplied collection." | |
[coll] | |
(/ (apply + coll) | |
(count coll))) | |
;; here's a nice, effectively one line solution. (It's often the case that the docstring will be longer than the function itself.) | |
(defn moving-average | |
"Returns a moving average of windows into the supplied |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; This one? | |
(defn delta [f start end] | |
(- (f end) (f start))) | |
;; or this one? | |
(defn delta [f start end] | |
(reduce - (map f [end start]))) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; Okay, here's some practice. This might be a good candidate for a macro. | |
(defn little-int | |
"Converts four input bits to an int, in little endian format." | |
[b0 b1 b2 b3] | |
(bit-or | |
(bit-shift-left b3 24) | |
(bit-or | |
(bit-shift-left (bit-and b2 0xff) 16) | |
(bit-or |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; ## Example of timeseries aggregation in cascalog. | |
;; | |
;; (copy paste all of this in at the REPL!) | |
(use 'cascalog.api) | |
(def tseries [["ndvi" 1 0 [1 2 3 4]] | |
["ndvi" 1 2 [2 3 4 5]] | |
["ndvi" 1 1 [4 3 2 1]] | |
["ndvi" 1 4 [1 2 3 4]] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; The goal here is to write an aggregator that takes in a sequence of | |
;; tuples of the form <tperiod, pixel-vector>, and returns tuples of | |
;; the form <pixel-index, min-time, max-time, timeseries>. | |
;; | |
;; We assume that we're receiving chunks for every month within the | |
;; range. We measure pixel-index as the position inside the chunk. | |
;; | |
;; Example: | |
;; (timeseries [[1 [7 4 2 1]] | |
;; [2 [1 2 3 4]] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;;old! | |
(ns forma.core | |
(:use cascalog.api | |
(clj-time [format :only (unparse formatters)] | |
[core :only (now)]) | |
(forma [hadoop :only (all-files | |
template-seqfile | |
globhfs-seqfile)])) | |
(:require (cascalog [ops :as c]) |