Skip to content

Instantly share code, notes, and snippets.

@erasmas
Created March 3, 2015 13:55
Show Gist options
  • Save erasmas/642f6c6f0f574b68a368 to your computer and use it in GitHub Desktop.
Save erasmas/642f6c6f0f574b68a368 to your computer and use it in GitHub Desktop.
Clojure fn that recursively find directories on HDFS containing *.parquet files
(defn dirs-with-parquet
"Recursively find directories on HDFS that contain *.parquet files"
[path]
(let [fs (FileSystem/getLocal (Configuration.))
directory (Path. path)
dir? #(.isDirectory fs %)
contains-parquet (fn [path] (not (empty? (.globStatus fs (Path. path "*.parquet")))))
dirs (tree-seq dir?
(fn [path] (map #(.getPath %) (.listStatus fs path)))
directory)]
(->> dirs
(filter contains-parquet)
(map #(.getPath (.toUri %))))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment