Skip to content

Instantly share code, notes, and snippets.

@uduki
Created July 9, 2012 11:21
Show Gist options
  • Save uduki/3075897 to your computer and use it in GitHub Desktop.
Save uduki/3075897 to your computer and use it in GitHub Desktop.
文字コード自動判別付き変換(any -> Unicode)
import qualified Data.ByteString as BS
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import Data.Text.ICU.Convert
toUnicodeWithDetect :: FilePath -> IO T.Text
toUnicodeWithDetect fp = do
a <- BS.readFile fp
cons <- mapM (flip open Nothing) ns
let xs = take 1 $ filter (T.all (`notElem` errs)) $ map (flip toUnicode a) cons
case xs of
[] -> return T.empty
[x] -> return x
where
ns = ["ucs4", "iso-2022-jp", "euc-jp", "sjis", "cp932", "utf-8"]
errs = ['\SUB', '\65533']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment