Skip to content

Instantly share code, notes, and snippets.

@cellularmitosis
Last active October 15, 2019 07:25
Show Gist options
  • Save cellularmitosis/af8eb17520545a3d2cd81e282776ba9e to your computer and use it in GitHub Desktop.
Save cellularmitosis/af8eb17520545a3d2cd81e282776ba9e to your computer and use it in GitHub Desktop.
Puzzle: "Given a list of words and a string made up of those words..."

Blog 2019/10/2

<- previous | index | next ->

Puzzle: "Given a list of words and a string made up of those words..."

This was the puzzle for this week's Puzzles Guild:

Given a dictionary of words and a string made up of those words (no spaces),
return the original sentence in a list.
If there is more than one possible reconstruction, return any of them.
If there is no possible reconstruction, then return null.

For example, given the set of words 'quick', 'brown', 'the', 'fox',
and the string "thequickbrownfox",
you should return ['the', 'quick', 'brown', 'fox'].

Given the set of words 'bed', 'bath', 'bedbath', 'and', 'beyond',
and the string "bedbathandbeyond",
return either ['bed', 'bath', 'and', 'beyond'] or ['bedbath', 'and', 'beyond'].

I used a simple regex-based lexer / tokenizer to solve this problem.

#!/usr/bin/env python
# Problem statement:
# Given a dictionary of words and a string made up of those words (no spaces),
# return the original sentence in a list.
# If there is more than one possible reconstruction, return any of them.
# If there is no possible reconstruction, then return null.
# For example, given the set of words 'quick', 'brown', 'the', 'fox',
# and the string "thequickbrownfox",
# you should return ['the', 'quick', 'brown', 'fox'].
# Given the set of words 'bed', 'bath', 'bedbath', 'and', 'beyond',
# and the string "bedbathandbeyond",
# return either ['bed', 'bath', 'and', 'beyond'] or ['bedbath', 'and', 'beyond'].
import re
def solve(words, phrase):
tokendefs = [re.compile(word) for word in words]
return lex(tokendefs, phrase)
def lex(tokendefs, input):
tokens = []
offset = 0
while offset < len(input):
for regex in tokendefs:
m = regex.match(input, offset)
if m is not None:
matched_text = m.group(0)
tokens.append(matched_text)
offset = offset + len(matched_text)
break
else:
raise Exception("No token matches input starting at %s" % input[offset:offset+16])
return tokens
def test1():
words = ['quick', 'brown', 'the', 'fox']
phrase = "thequickbrownfox"
assert solve(words, phrase) == ['the', 'quick', 'brown', 'fox']
def test2():
words = 'bed', 'bath', 'bedbath', 'and', 'beyond'
phrase = "bedbathandbeyond"
assert solve(words, phrase) in [['bed', 'bath', 'and', 'beyond'], ['bedbath', 'and', 'beyond']]
if __name__ == "__main__":
test1()
test2()
print "All tests passed."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment