Skip to content

Instantly share code, notes, and snippets.

@cpdean
Last active December 21, 2015 00:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cpdean/fbee06b6a7a2d90a28fc to your computer and use it in GitHub Desktop.
Save cpdean/fbee06b6a7a2d90a28fc to your computer and use it in GitHub Desktop.
type DocumentBody = Raw String | Words List String
tokenize: DocumentBody -> List String
tokenize s =
case s of
Raw str_body -> String.split " " str_body |> (List.map String.toLower)
Words list_body -> List.map String.toLower list_body
-- Tests
tests =
suite "Tokenizer"
[ test "simple" <| assertEqual ["this", "is", "a", "simple", "string"]
<| tokenize (Raw "this is a simple string")
, test "downcasing tokens: string" <| assertEqual ["foo", "bar"]
<| tokenize (Raw "FOO BAR")
, test "downcasing tokens: list of str" <| assertEqual ["foo", "bar"]
<| tokenize (Words ["Foo", "BAR"])
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment