Skip to content

Instantly share code, notes, and snippets.

@zoren
Forked from biggert/gist:6453648
Last active June 19, 2020 09:09
Show Gist options
  • Save zoren/4eb9543395d743670154b7b10059e608 to your computer and use it in GitHub Desktop.
Save zoren/4eb9543395d743670154b7b10059e608 to your computer and use it in GitHub Desktop.
Get a clojure reader using the encoding by the BOM
(ns util
(:require [clojure.java.io :as io])
(:import org.apache.commons.io.input.BOMInputStream
org.apache.commons.io.ByteOrderMark))
(defn bom-reader
"Returns a BOM contextual reader with the proper encoding set (= BOM), defaults to UTF-8"
[input-stream]
(let [bom-array
(into-array [ByteOrderMark/UTF_16LE
ByteOrderMark/UTF_16BE
ByteOrderMark/UTF_8
ByteOrderMark/UTF_32BE
ByteOrderMark/UTF_32LE])
bom-stream (BOMInputStream. input-stream false bom-array)
char-set-name (.getCharsetName (or (.getBOM bom-stream) ByteOrderMark/UTF_8))]
(io/reader bom-stream :encoding char-set-name)))
@zoren
Copy link
Author

zoren commented Sep 19, 2019

Note the byte order mark is not included in the stream. Contrary to @biggert original.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment