Skip to content

Instantly share code, notes, and snippets.

@ghadishayban
Last active December 25, 2015 15:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ghadishayban/7002262 to your computer and use it in GitHub Desktop.
Save ghadishayban/7002262 to your computer and use it in GitHub Desktop.
Regex split as Clojure reducer. Faster than java's Pattern.split(string)
;; Reimplementation of Java's (.split Pattern) to be zero-allocation, similar
;; to JDK 8 splitAsStream
;; Discards trailing ""
;; respects `clojure.core.reduced`
(defn split-string
[^CharSequence s ^java.util.regex.Pattern re]
(reify clojure.core.protocols/CollReduce
(coll-reduce [_ f init]
(let [m (re-matcher re s)
[acc i] (loop [acc init i 0]
(if (.find m)
(let [match (.toString (.subSequence s i (.start m)))
ret (f acc match)]
(if (reduced? ret)
[ret i]
(recur ret (.end m))))
[acc i]))]
(if (zero? i)
init
(if (reduced? acc)
@acc
(let [end-match (.toString (.subSequence s i (.length s)))]
(if (= end-match "")
acc
(f acc end-match)))))))))
;; (frequencies (split-string "my name is slim shady" #"\s+"))
;; (->> (split-string "The quick fox jumped over the lazy..." #"\s+") (r/take 2) (into []))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment