Skip to content

Instantly share code, notes, and snippets.

@vtjeng
Last active February 3, 2022 03:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vtjeng/bbd5596ff655f9137a92fa89f978f2f4 to your computer and use it in GitHub Desktop.
Save vtjeng/bbd5596ff655f9137a92fa89f978f2f4 to your computer and use it in GitHub Desktop.
Prove adversarial wordle has no 3-move solutions, based on work by zwegner
# A shorter version of zwegner's https://gist.github.com/zwegner/508cc183ab94dd27686a40384783ca4e
# Proof of optimality of 4-move solution to adversarial Wordle
# https://qntm.org/files/wordle/index.html
import collections
with open("wordle-words.txt") as f:
normal_words = [word.strip() for word in f]
with open("wordle-fake-words.txt") as f:
all_words = [word.strip() for word in f] + normal_words
# Get the colors for guessing the word <a> if the real word is guess <b>, as a
# 10-bit mask (5 green 5 yellow). Note that we only give as many yellow clues
# as there are matching letters (so guessing 'aabaa' against 'start' gives Y----)
def get_color_mask(a, b):
g = y = 0
c = collections.Counter(b)
for i in range(5):
if b[i] == a[i]:
g |= 1 << i
c[a[i]] -= 1
for i in range(5):
if b[i] != a[i] and c[a[i]] > 0:
c[a[i]] -= 1
y |= 1 << i
return g << 5 | y
def full_search():
# each guess splits the solution space `normal_words` into one or more
# equivalence classes.
# `maximum_num_equivalence_classes` tracks the largest number of
# classes that any guess splits the space into. This is also the best
# we can do on our second guess.
maximum_num_equivalence_classes = 0
# `minimax_equivalence_class_size` tracks the minimum (among all guesses)
# size of the largest equivalent class for each guess (i.e. the number of
# remaining candidates that Absurdle will leave us with)
minimax_equivalence_class_size = float("inf")
for guess in all_words:
c = collections.Counter()
for target in normal_words:
s = get_color_mask(guess, target)
c[s] += 1
num_equivalence_classes = len(c)
[(_, maximum_equivalence_class_size)] = c.most_common(1)
if num_equivalence_classes > maximum_num_equivalence_classes:
maximum_num_equivalence_classes = num_equivalence_classes
print(
f"Guessing '{guess}' splits the solution space into {num_equivalence_classes} equivalence classes."
)
if maximum_equivalence_class_size < minimax_equivalence_class_size:
minimax_equivalence_class_size = maximum_equivalence_class_size
print(
f"Guessing '{guess}' leaves a largest equivalence class of size {maximum_equivalence_class_size}"
)
if maximum_num_equivalence_classes < minimax_equivalence_class_size:
print(
f"On the second guess, we can split the solution space into at most {maximum_num_equivalence_classes} equivalence classes, but the smallest maximum equivalence class among all words is of size {minimax_equivalence_class_size}"
)
full_search()
@vtjeng
Copy link
Author

vtjeng commented Jan 10, 2022

Get text files here: https://gist.github.com/zwegner/508cc183ab94dd27686a40384783ca4e

Expected output:

Guessing 'aahed' splits the solution space into 68 equivalence classes.                                                                                             
Guessing 'aahed' leaves a largest equivalence class of size 448                                                                                                     
Guessing 'aargh' splits the solution space into 76 equivalence classes.                                                                                             
Guessing 'aarti' splits the solution space into 82 equivalence classes.                                                                                             
Guessing 'aarti' leaves a largest equivalence class of size 380                                                                                                     
Guessing 'abear' splits the solution space into 83 equivalence classes.                                                                                             
Guessing 'abele' splits the solution space into 86 equivalence classes.                                                                                             
Guessing 'abers' splits the solution space into 96 equivalence classes.                                                                                             
Guessing 'abers' leaves a largest equivalence class of size 278                                                                                                     
Guessing 'abies' leaves a largest equivalence class of size 234                                                                                                     
Guessing 'abler' splits the solution space into 107 equivalence classes.                                                                                            
Guessing 'ablet' splits the solution space into 114 equivalence classes.                                                                                            
Guessing 'aeons' leaves a largest equivalence class of size 195                                                                                                     
Guessing 'aeros' leaves a largest equivalence class of size 183                                                                                                     
Guessing 'aesir' splits the solution space into 116 equivalence classes.                                                                                            
Guessing 'aesir' leaves a largest equivalence class of size 168                                                                                                     
Guessing 'ahint' splits the solution space into 117 equivalence classes.                                                                                            
Guessing 'ailed' splits the solution space into 127 equivalence classes.                                                                                            
Guessing 'aline' splits the solution space into 132 equivalence classes.                                                                                            
Guessing 'antre' splits the solution space into 134 equivalence classes.                                                                                            
Guessing 'caret' splits the solution space into 145 equivalence classes.                                                                                            
Guessing 'carte' splits the solution space into 146 equivalence classes.                                                                                            
Guessing 'reast' splits the solution space into 147 equivalence classes.                                                                                            
Guessing 'salet' splits the solution space into 148 equivalence classes.                                                                                            
Guessing 'trace' splits the solution space into 150 equivalence classes.                                                                                            
On the second guess, we can split the solution space into at most 150 equivalence classes, but the smallest maximum equivalence class among all words is of size 168

Note: An earlier version of the code had an incorrect algorithm to determine which equivalence class a word pair belonged to.

@genos
Copy link

genos commented Feb 2, 2022

To make it even shorter, I think you can remove the import array line, as well as the and a[i] in b check on line 25 since at that point c[a[i]] > 0 implies a[i] in b.

@vtjeng
Copy link
Author

vtjeng commented Feb 3, 2022

Verified to be working with no significant impact on performance - thanks @genos.

@genos
Copy link

genos commented Feb 3, 2022

It’s certainly not going to make a noticeable difference with words only 5 characters long, but I figured we might as well continue to tidy up if that’s the exercise 🤷 Anyway, thanks for posting this!

@vtjeng
Copy link
Author

vtjeng commented Feb 3, 2022

Oh @genos I was more concerned that there would be a negative impact on performance: the a[i] in b code looked like code that someone might add to have an early return because defaultdict access was slower than searching for string membership (but it wasn't the case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment