Skip to content

Instantly share code, notes, and snippets.

@siefca
Created October 19, 2015 19:45
Show Gist options
  • Save siefca/8b47ba0ec4835d01692d to your computer and use it in GitHub Desktop.
Save siefca/8b47ba0ec4835d01692d to your computer and use it in GitHub Desktop.
Eneltron's Tokenizer – example usage
(require 'eneltron.tokens)
(eneltron.tokens/initialize-tokenizer)
(def tekst "Siała baba mak. Nie wiedziała jak. Raz, dwa – oraz – 4.")
(time (def wynik (eneltron.tokens/tokenize tekst)))
; => "Elapsed time: 0.230337 msecs"
(apply print
(map #(str (apply str (next (str (:token-class (meta %1))))) " ->" \tab \tab (apply str %1) \newline)
wynik))
letter -> Siała
separator ->
letter -> baba
separator ->
letter -> mak
punctuation -> .
separator ->
letter -> Nie
separator ->
letter -> wiedziała
separator ->
letter -> jak
punctuation -> .
separator ->
letter -> Raz
separator ->
letter -> –
separator ->
letter -> oraz
separator ->
letter -> –
separator ->
number -> 4
punctuation -> .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment