Skip to content

Instantly share code, notes, and snippets.

@mowat27
Created May 28, 2012 13:25
Show Gist options
  • Save mowat27/2819166 to your computer and use it in GitHub Desktop.
Save mowat27/2819166 to your computer and use it in GitHub Desktop.
EuroClojure Learnings: TSV file
; My "lightbulb" moment at EuroClojure 2012 was that clojure is at it's most
; powerful when you build abstractions in the data and keep the functions as
; generic as possible so you can chain them easily
; For example, here is some code I wrote a few months ago to parse a freebase TSV file
(def keywords {"id" "freebase_id"})
(defn override-keywords [field-name]
(let [alternative (get keywords field-name)]
(if (nil? alternative) field-name alternative)))
(defn get-field-names [line] (map override-keywords (str/split line #"\t")))
(defn get-fields [line] (str/split line #"\t"))
(defn map-fields [hdrs line] (zipmap hdrs line))
(defn map-file [path]
(with-open [rdr (reader path)]
(let [lines (into [] (line-seq rdr))
hdrs (get-field-names (first lines))
data (map get-fields (rest lines)) ]
(map #(map-fields hdrs %1) data))))
; On the other hand, here is some code Malcolm Sparks presented that does much the same
; See https://github.com/malcolmsparks/euroclojure2012
(defn get-dataset []
(let [[header & rows]
(->> "Downloads/olympic_games.tsv"
(file (System/getProperty "user.home"))
reader line-seq
(map #(split % #"\t")))]
(map #(zipmap header %) rows)))
; I'm doing a bit more, but there is no doubt that Malcolm's solution is more flexible
; and makes better use of the core library.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment