Word segmentation
One of the issues with domain names is that spaces aren't allowed. So we get domain names like this:
- penisland.com (Pen Island)
- expertsexchange.com (Experts Exchange)
Now we also have the problem with #hashtags on social media platforms.
We want to be able to take a string without spaces and insert the spaces so that the words are separated and our gradeschool teacher can be happy again.
Your task is to write a function that takes a string without spaces and a dictionary of known words and returns all possible ways it could be segmented (i.e., insert spaces) into those words. If it can't be segmented, it should return an empty sequence.
(segmentations "hellothere" ["hello" "there"]) ;=> ("hello there")
(segmentations "fdsfsfdsjkljf" ["the" "he" "she" "it"...]) ;=> ()
Bonus: use a dictionary file and some text from somewhere and do a real test.
Super bonus: make it lazy.
Thanks to this site for the challenge idea where it is considered Expert level in JavaScript.
Email submissions to eric@purelyfunctional.tv until May 31, 2020. You can discuss the submissions in the comments below.
Oops, Eric caught a bug in my code! My "starts-with" test passed even when string to be segmented is shorter than the test word. So here's a fix:
Btw, I wanted to avoid using clojure.strings or interop, but the best I can come up with for starts-with? is:
Pretty compact, but still makes my single function implementation kind of ugly if I incorporate it as a lambda or letfn. Oh well.