Skip to content

Instantly share code, notes, and snippets.

@cemerick
Last active December 16, 2015 15:39
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cemerick/5457242 to your computer and use it in GitHub Desktop.
Save cemerick/5457242 to your computer and use it in GitHub Desktop.
Using language-detection from Clojure
; using https://code.google.com/p/language-detection/
; i.e. [com.cybozu.labs/langdetect "1.1-20120112"]
(require '[clojure.java.io :as io])
(->> #{"af" "ar" "bg" "bn" "cs" "da" "de" "el" "en" "es" "et" "fa" "fi" "fr" "gu"
"he" "hi" "hr" "hu" "id" "it" "ja" "kn" "ko" "lt" "lv" "mk" "ml" "mr" "ne"
"nl" "no" "pa" "pl" "pt" "ro" "ru" "sk" "sl" "so" "sq" "sv" "sw" "ta" "te"
"th" "tl" "tr" "uk" "ur" "vi" "zh-cn" "zh-tw"}
(map (partial str "profiles/"))
(map (comp slurp io/resource))
com.cybozu.labs.langdetect.DetectorFactory/loadProfile)
(defn- detect-language
[s]
(.detect (doto (com.cybozu.labs.langdetect.DetectorFactory/create)
(.append s))))
(detect-language "Jeuhj mijn vader is naar bed, ik heb de benedenverdieping weer voor mezelf alleen")
;=> "nl"
(detect-language "なんだかわけわからなくなった。すまん。俺がいいたいのはmathematics is a force ということ。いつでもそうだぜ!")
;=> "ja"
@austinyun
Copy link

Always feel like I'm cheating when I do map partial, but I guess it's idiomatic lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment