cellularmitosis/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Blog 2019/10/2
<- previous |
index |
next ->
Puzzle: "Given a list of words and a string made up of those words..."

This was the puzzle for this week's Puzzles Guild:
Given a dictionary of words and a string made up of those words (no spaces),
return the original sentence in a list.
If there is more than one possible reconstruction, return any of them.
If there is no possible reconstruction, then return null.

For example, given the set of words 'quick', 'brown', 'the', 'fox',
and the string "thequickbrownfox",
you should return ['the', 'quick', 'brown', 'fox'].

Given the set of words 'bed', 'bath', 'bedbath', 'and', 'beyond',
and the string "bedbathandbeyond",
return either ['bed', 'bath', 'and', 'beyond'] or ['bedbath', 'and', 'beyond'].

I used a simple regex-based lexer / tokenizer to solve this problem.

  
## lex.py
#!/usr/bin/env python

# Problem statement:

# Given a dictionary of words and a string made up of those words (no spaces),
# return the original sentence in a list.
# If there is more than one possible reconstruction, return any of them.
# If there is no possible reconstruction, then return null.

# For example, given the set of words 'quick', 'brown', 'the', 'fox',
# and the string "thequickbrownfox",
# you should return ['the', 'quick', 'brown', 'fox'].

# Given the set of words 'bed', 'bath', 'bedbath', 'and', 'beyond',
# and the string "bedbathandbeyond",
# return either ['bed', 'bath', 'and', 'beyond'] or ['bedbath', 'and', 'beyond'].

import re

def solve(words, phrase):
    tokendefs = [re.compile(word) for word in words]
    return lex(tokendefs, phrase)

def lex(tokendefs, input):
    tokens = []
    offset = 0
    while offset < len(input):
        for regex in tokendefs:
            m = regex.match(input, offset)
            if m is not None:
                matched_text = m.group(0)
                tokens.append(matched_text)
                offset = offset + len(matched_text)
                break
        else:
            raise Exception("No token matches input starting at %s" % input[offset:offset+16])
    return tokens

def test1():
    words = ['quick', 'brown', 'the', 'fox']
    phrase = "thequickbrownfox"
    assert solve(words, phrase) == ['the', 'quick', 'brown', 'fox']

def test2():
    words = 'bed', 'bath', 'bedbath', 'and', 'beyond'
    phrase = "bedbathandbeyond"
    assert solve(words, phrase) in [['bed', 'bath', 'and', 'beyond'], ['bedbath', 'and', 'beyond']]

if __name__ == "__main__":
    test1()
    test2()
    print "All tests passed."
	#!/usr/bin/env python

	# Problem statement:

	# Given a dictionary of words and a string made up of those words (no spaces),
	# return the original sentence in a list.
	# If there is more than one possible reconstruction, return any of them.
	# If there is no possible reconstruction, then return null.

	# For example, given the set of words 'quick', 'brown', 'the', 'fox',
	# and the string "thequickbrownfox",
	# you should return ['the', 'quick', 'brown', 'fox'].

	# Given the set of words 'bed', 'bath', 'bedbath', 'and', 'beyond',
	# and the string "bedbathandbeyond",
	# return either ['bed', 'bath', 'and', 'beyond'] or ['bedbath', 'and', 'beyond'].

	import re

	def solve(words, phrase):
	tokendefs = [re.compile(word) for word in words]
	return lex(tokendefs, phrase)

	def lex(tokendefs, input):
	tokens = []
	offset = 0
	while offset < len(input):
	for regex in tokendefs:
	m = regex.match(input, offset)
	if m is not None:
	matched_text = m.group(0)
	tokens.append(matched_text)
	offset = offset + len(matched_text)
	break
	else:
	raise Exception("No token matches input starting at %s" % input[offset:offset+16])
	return tokens

	def test1():
	words = ['quick', 'brown', 'the', 'fox']
	phrase = "thequickbrownfox"
	assert solve(words, phrase) == ['the', 'quick', 'brown', 'fox']

	def test2():
	words = 'bed', 'bath', 'bedbath', 'and', 'beyond'
	phrase = "bedbathandbeyond"
	assert solve(words, phrase) in [['bed', 'bath', 'and', 'beyond'], ['bedbath', 'and', 'beyond']]

	if __name__ == "__main__":
	test1()
	test2()
	print "All tests passed."