Skip to content

Instantly share code, notes, and snippets.

@uroybd
Created August 16, 2015 08:52
Show Gist options
  • Save uroybd/5521712e68d7641aa742 to your computer and use it in GitHub Desktop.
Save uroybd/5521712e68d7641aa742 to your computer and use it in GitHub Desktop.
Html Cleaner
(use 'pl.danieljanus.tagsoup)
(use 'hiccup.core)
(defn parser [file_path]
(pl.danieljanus.tagsoup/parse (java.io.ByteArrayInputStream. (.getBytes (slurp file_path)))))
(defn attr_filter [element]
(cond
(map? element) {}
(= (type []) (type element)) (mapv identity (map attr_filter element))
:else element))
(defn sanitizer [parsed_data]
(mapv identity (map attr_filter parsed_data)))
(defn renderer [sanitized_data]
(hiccup.core/html sanitized_data))
(defn html_cleaner [file_path]
(renderer (sanitizer (parser file_path))))
(defn clean-o-matic [input_file output_file]
(with-open [w (clojure.java.io/writer output_file :append true)]
(.write w (html_cleaner input_file))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment