Skip to content

Instantly share code, notes, and snippets.

@sritchie
Created February 2, 2011 17:38
Show Gist options
  • Save sritchie/808044 to your computer and use it in GitHub Desktop.
Save sritchie/808044 to your computer and use it in GitHub Desktop.
(defn whole-file
"Custom scheme for dealing with entire files."
[field-name]
(WholeFile. (w/fields field-name)))
(defn hfs-wholefile
"Creates a tap on HDFS using the wholefile format. Guaranteed not
to chop files up! Required for unsupported compression formats like HDF."
[path]
(w/hfs-tap (whole-file ["file"]) path))
(defn files-with-name
"Query to return all files in the supplied directory, along with filenames."
[dir]
(let [source (hfs-wholefile dir)]
(?<- (stdout) [?file ?filename]
(source ?file)
((AddYearFunction.) ?file :> ?filename))))
Copy link

ghost commented Nov 15, 2013

Hey, something like this should really really be part of cascalogs standard taps. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment