Skip to content

Instantly share code, notes, and snippets.

@cellularmitosis cellularmitosis/README.md
Last active Oct 15, 2019

Embed
What would you like to do?
Puzzle: "Given a list of words and a string made up of those words..."

Blog 2019/10/2

<- previous | index | next ->

Puzzle: "Given a list of words and a string made up of those words..."

This was the puzzle for this week's Puzzles Guild:

Given a dictionary of words and a string made up of those words (no spaces),
return the original sentence in a list.
If there is more than one possible reconstruction, return any of them.
If there is no possible reconstruction, then return null.

For example, given the set of words 'quick', 'brown', 'the', 'fox',
and the string "thequickbrownfox",
you should return ['the', 'quick', 'brown', 'fox'].

Given the set of words 'bed', 'bath', 'bedbath', 'and', 'beyond',
and the string "bedbathandbeyond",
return either ['bed', 'bath', 'and', 'beyond'] or ['bedbath', 'and', 'beyond'].

I used a simple regex-based lexer / tokenizer to solve this problem.

#!/usr/bin/env python
# Problem statement:
# Given a dictionary of words and a string made up of those words (no spaces),
# return the original sentence in a list.
# If there is more than one possible reconstruction, return any of them.
# If there is no possible reconstruction, then return null.
# For example, given the set of words 'quick', 'brown', 'the', 'fox',
# and the string "thequickbrownfox",
# you should return ['the', 'quick', 'brown', 'fox'].
# Given the set of words 'bed', 'bath', 'bedbath', 'and', 'beyond',
# and the string "bedbathandbeyond",
# return either ['bed', 'bath', 'and', 'beyond'] or ['bedbath', 'and', 'beyond'].
import re
def solve(words, phrase):
tokendefs = [re.compile(word) for word in words]
return lex(tokendefs, phrase)
def lex(tokendefs, input):
tokens = []
offset = 0
while offset < len(input):
for regex in tokendefs:
m = regex.match(input, offset)
if m is not None:
matched_text = m.group(0)
tokens.append(matched_text)
offset = offset + len(matched_text)
break
else:
raise Exception("No token matches input starting at %s" % input[offset:offset+16])
return tokens
def test1():
words = ['quick', 'brown', 'the', 'fox']
phrase = "thequickbrownfox"
assert solve(words, phrase) == ['the', 'quick', 'brown', 'fox']
def test2():
words = 'bed', 'bath', 'bedbath', 'and', 'beyond'
phrase = "bedbathandbeyond"
assert solve(words, phrase) in [['bed', 'bath', 'and', 'beyond'], ['bedbath', 'and', 'beyond']]
if __name__ == "__main__":
test1()
test2()
print "All tests passed."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.