Skip to content

Instantly share code, notes, and snippets.

@rain-1
Last active April 23, 2024 05:25
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rain-1/df003eb5f5ff50be791cba85c1041d16 to your computer and use it in GitHub Desktop.
Save rain-1/df003eb5f5ff50be791cba85c1041d16 to your computer and use it in GitHub Desktop.
crossword-solving-with-gpt.md

Solving crosswords with GPT

This is my research report. I've included a lot of the code and chat interactions for people to read through if interested. I worked on this crossword https://www.theguardian.com/crosswords/quick/16553

I had a vision for a GPT powered crossword solver. My idea is that it would do a tree search over GPT generated guesses that would include the knowns so far, like:

givens

I didn't end up doing that because ChatGPT and GPT-4 are terrible at questions involving the length of words, or guessing words that contain specific letters at specific locations. It can sometimes do them but usually fails. I think this is because it's token based. I am curious whether a character based LLM would be better at such tasks.

Here's a couple examples of GPT-4 trying to guess from clues plus givens:

Screenshot_2023-05-28_16-54-34 Screenshot_2023-05-28_16-57-45 Screenshot_2023-05-28_19-06-42

(Thanks to nat.dev for the GPT-4 access)

Here's a fun asciinema to watch of the partial crossword solver script: https://asciinema.org/a/oEURFRS5BfVHEKN9GjjW13HKe

OCRing a crossword puzzle

I started working on this without having realized that though. The first thing I did was generate a program to split a grid up into parse and turn that into a python list of lists. I tried to use tesseract OCR to recognize the numbers but it was absolutely terrible at that. I switched over to Cuneiform and it also failed.

can you please make a program that splits an image into a row*col grid and process each cell individually. it should see if the cell is mostly white, mostly black, or apply tesseract OCR on it if it seems to have text in it. The output should be a json file representing the grid. " " for empty, "#" for black, number if it contained a number

This tool is actually great, it managed to convert the picture to:

Screenshot_2023-05-27_15-58-34

but because I was having trouble with the OCR, I had it save every cell it split as an individual PNG for me to investigate. I tried tesseract on the individual cells and I improved performance by cropping the black border. So I asked ChatGPT to update the script to do this. And then again with expanding the cropped image with some white borders. Still I couldn't get 100% accuracy on the OCR.

cells

Here's an example prompt I used:

add a 2 px white border to a from PIL import Image

Another I considered was cropping all the whitespace around the numbers away:

python PIL Image. i want to crop the white parts of the sides away

This gave a completely broken program. I didn't bother trying to reprompt or explain the bug to it, didn't pursue this idea.

Here are some OCR test cases:

Screenshot_2023-05-27_16-02-47 ocr1 ocr2

And I tried LLaVa randomly, it didn't get it either:

Screenshot_2023-05-27_16-23-49

Crossword Processing Lib

On the way I had a couple bash scripts generated for me, like write a bash script to split a dictionary file into a set of files 1.txt 2.txt 3.txt etc collecting up words of each length. This was very helpful.

I worked with ChatGPT to generate a small library of functions for dealing with a crossword grid in python. I made an example of a nested loop for trying out clues and backtracking on failue, I asked ChatGPT to transform it into a recursive procedure - it got very close, just a couple lines wrong.

Here is a prompt I used to generate the preprocessing function that extracts every clue number from our python list-of-lists crossword representation, building up a map of their coordinates:

python. i have a 2d array representing a crossword grid.

" " represents an empty cell, "#" represents a filled cells, some cells have numbers in them for the clues

want to loop through it and find each number, replace those with an empty cell, but also create a hash
table that maps numbers to row,col coordinates

...
[ChatGPT generated code omitted]
...

that's amazing. nice work. can you also measure how far across and down it is possible to go from each
number

...
[omitted]
...

write a crossword_get and crossword_set function that takes a number, a boolean representing down/across
and a word, it places the word into the grid.

refactor the get/set functions to make use of the number_coordinates 

Tree search

The idea of 'amplifying' ChatGPTs intelligence/capabilities by putting them into a tree search loop seems plausible but didn't pan out in this case. What actually ended up being most effective was just asking it for a few guesses and pasting the whole set of clues in. I asked it to transform that into a JSON object for me to stick into my program. I used a full wordlist for the unknowns (the ones where ChatGPTs guesses were rubbish or clearly didn't include the answer). And that program was able to solve about 80% of the crossword.

Then I just finished it manually with a little help from GPT-4.

Let me show the code I input to ChatGPT:

for word in words_len6:
    modified_grid = copy.deepcopy(original_grid)
    if crosswordlib.crossword_set(modified_grid, 3, False, word, number_coordinates):
        clear_terminal()
        print(":{}".format(word))
        for row in modified_grid:
            print(" ".join(row))
        snooze()

        for word in words_len4:
            sub_modified_grid = copy.deepcopy(modified_grid)
            if crosswordlib.crossword_set(sub_modified_grid, 2, True, word, number_coordinates):
                clear_terminal()
                for row in sub_modified_grid:
                    print(" ".join(row))
                snooze()

and then I asked it to transform that into a recursive function:

so this processes just 2 crossword clues. We want to generalize this to process any length.

Can you write function that takes in an array like [(words_len6, 3, False), (words_len4, 32, True)] and processes them. support any length of array, not just length 2.

Here is what it came up with:

import copy

def process_crossword_clues(clue_list, original_grid, number_coordinates):
    modified_grid = copy.deepcopy(original_grid)
    for clue in clue_list:
        words_list, start_position, is_horizontal = clue
        for word in words_list:
            sub_modified_grid = copy.deepcopy(modified_grid)
            if crosswordlib.crossword_set(sub_modified_grid, start_position, is_horizontal, word, number_coordinates):
                clear_terminal()
                for row in sub_modified_grid:
                    print(" ".join(row))
                snooze()

This was really only 2 lines off of the final working code that I used to use backtracking search to fill the clues in! Pretty impressive even though it wasn't perfect.

Conclusion

Overall this was a failure in terms of writing a fully automatic crossword solver. I basically solved the crossword manually, but using ChatGPT and GPT-4 to cheat. I don't know how I could automatically solve crosswords with this tool. Maybe a character based LLM could be used for my original idea - I don't know if there is one out there though.

It was also a failure in terms of OCR. I can't believe that I couldn't parse digits out of these little PNG images. Very disappointing.

The thing that I would say was a success in this project was pair programming with ChatGPT. It was very capable at understanding and implementing my instructions to its programming ability level - which is somewhat weak. It is fantastic that you can have it tweak and refactor code just by describing changes you want done. It can write some things correctly. Other things it gets wrong. You need to fix bugs in the code and help it along. But it is miraculous to me that it is able to do what it does. I have never seen any program synthesis tool anything like this before - and it's not just program synthesis, it's much more general.

I've included the code that we worked on together, ChatGPT wrote about 80% of it or more.

User: can you try to solve this crossword please
ChatGPT:
Of course! I'll do my best to help you solve the crossword. Please provide me with the crossword clues and the length of the words, and I'll assist you in finding the answers.
User:
Across
1
Feeling of excitement (4)
3
Supposedly essential item to buy (4-4)
8
List of names (4)
9
Steals money for personal use (informal) (8)
11
Flash jewellery (5-5)
14
Layer of foliage created by high branches of forest trees (6)
15
Plaza (6)
17
German breed of powerful dog used to herd livestock (10)
20
Having wide scope (3-5)
21
Sheet of floating ice (4)
22
(Of a person) strongly built (8)
23
* (4)
Down
1
On a horse without a saddle (8)
2
Unspecified large numbers (8)
4
Wild (6)
5
Bandage that stops the flow of blood from an artery (10)
6
Figure skating jump (4)
7
Comfort (4)
10
Study of pharaohs and pyramids etc (10)
12
Collapse (4,4)
13
Expression of total agreement (4,4)
16
Asap! (2,4)
18
Bay and multiple estuary on the coast of East Anglia (4)
19
Island to which Napoleon was exiled in 1814 (4)
ChatGPT:
Here are the solutions to the crossword:
Across:
Zest
Must-have
Rota
Pilfers
Bling-bling
Canopy
Square
Rottweiler
Wide-ranging
Floe
Strapping
Aster
Down:
Bareback
Countless
Untamed
Tourniquet
Axel
Ease
Egyptology
Fall down
You betcha
Chop-chop
Wash
Elba
I hope this helps you complete the crossword! Let me know if you have any further questions.
User:
redo 1, B _ _ _
ChatGPT:
Apologies for the incorrect answer. Here's the revised answer:
Buzz
User:
redo 9 _ _ _ _ _ E _ S
ChatGPT:
The answer for clue 9 across with the given letters "____E___S" is:
Embezzles
User:
redo 6, _ _ Z _
ChatGPT:
The answer for clue 6 down with the pattern "_ _ Z _" is:
Lutz
User:
redo 7 _ _ E _
ChatGPT:
The answer for clue 7 down with the pattern "_ _ E _" is:
Ease
User:
3 / 3
redo 4 down, 5 letters.
pattern: U _ M _ L _
ChatGPT:
The answer for clue 4 down with the pattern "U _ M _ L _" is:
Feral
User:
2 / 2
unspecied large numbers, 8 letters. pattern: _ _ T _ I _ N _
ChatGPT:
The answer for the clue "Unspecified large numbers" with the pattern "_ _ T _ I _ N _" is:
Multitudinous
User:
3 / 3
guess a few possibilities for "Having wide scope" first word all, second word 5 letters
ChatGPT:
Here are a few possibilities for the clue "Having wide scope" with the first word starting with "all" and the second word having 5 letters:
All-Encompassing
All-Embracing
All-Global
All-Vast
All-Wide
Please note that the actual answer would depend on the specific crossword puzzle and the intersecting letters.
def process_crossword_grid(grid):
num_to_coords = {} # Hash table to map numbers to row,col coordinates
across_lengths = {} # Store lengths across from each number
down_lengths = {} # Store lengths down from each number
for row in range(len(grid)):
for col in range(len(grid[row])):
cell = grid[row][col]
if cell.isdigit():
number = int(cell)
num_to_coords[number] = (row, col)
grid[row][col] = " " # Replace the number with an empty cell
# Measure the length across from the number
across_length = 1
while col + across_length < len(grid[row]) and grid[row][col + across_length] != "#":
across_length += 1
across_lengths[number] = across_length
# Measure the length down from the number
down_length = 1
while row + down_length < len(grid) and grid[row + down_length][col] != "#":
down_length += 1
down_lengths[number] = down_length
return num_to_coords, across_lengths, down_lengths, grid
def crossword_get(grid, number, is_down, number_coordinates):
if number not in number_coordinates:
return None
start_row, start_col = number_coordinates[number]
if is_down:
word = ""
row = start_row
while row < len(grid) and grid[row][start_col] != "#":
word += grid[row][start_col]
row += 1
else:
word = ""
col = start_col
while col < len(grid[start_row]) and grid[start_row][col] != "#":
word += grid[start_row][col]
col += 1
return word
def crossword_set(grid, number, is_down, word, number_coordinates):
if number not in number_coordinates:
return False
start_row, start_col = number_coordinates[number]
if is_down:
if start_row + len(word) > len(grid):
return False
for i, char in enumerate(word):
if grid[start_row + i][start_col] != " " and grid[start_row + i][start_col] != char:
return False
grid[start_row + i][start_col] = char
else:
if start_col + len(word) > len(grid[start_row]):
return False
for i, char in enumerate(word):
if grid[start_row][start_col + i] != " " and grid[start_row][start_col + i] != char:
return False
grid[start_row][start_col + i] = char
return True
def print_crossword(grid):
for row in grid:
print(" ".join(row))
import crosswordlib
import time
import copy
def clear_terminal():
print('\033c', end='')
def snooze():
time.sleep(0.1)
crossword_grid = [
["1", " ", "2", " ", "#", "3", "4", " ", "5", " ", "6", " ", "7"],
[" ", "#", " ", "#", "#", "#", " ", "#", " ", "#", " ", "#", " "],
["8", " ", " ", " ", "#", "9", " ", " ", " ", " ", " ", " ", " "],
[" ", "#", " ", "#", "10", "#", " ", "#", " ", "#", " ", "#", " "],
["11", " ", " ", " ", " ", " ", " ", " ", " ", " ", "#", "#", "#"],
[" ", "#", " ", "#", " ", "#", " ", "#", " ", "#", "12", "#", "13"],
["14", " ", " ", " ", " ", " ", "#", "15", " ", " ", " ", " ", " "],
[" ", "#", " ", "#", " ", "#", "16", "#", " ", "#", " ", "#", " "],
["#", "#", "#", "17", " ", " ", " ", " ", " ", " ", " ", " ", " "],
["18", "#", "19", "#", " ", "#", " ", "#", " ", "#", " ", "#", " "],
["20", " ", " ", " ", " ", " ", " ", " ", "#", "21", " ", " ", " "],
[" ", "#", " ", "#", " ", "#", " ", "#", "#", "#", " ", "#", " "],
["22", " ", " ", " ", " ", " ", " ", " ", "#", "23", " ", " ", " "]]
crosswordlib.print_crossword(crossword_grid)
number_coordinates, across_lengths, down_lengths, original_grid = crosswordlib.process_crossword_grid(crossword_grid)
print("Number coordinates:")
for number, coords in number_coordinates.items():
print(f"Number {number}: {coords}")
print("\nAcross Lengths:")
for number, length in across_lengths.items():
print(f"Number {number}: {length}")
print("\nDown Lengths:")
for number, length in down_lengths.items():
print(f"Number {number}: {length}")
def process_crossword_clues(clue_list, original_grid, number_coordinates):
modified_grid = copy.deepcopy(original_grid)
if len(clue_list) == 0:
# Win condition
print('⭐')
exit()
return
clue = clue_list[0]
words_list, start_position, is_horizontal = clue
for word in words_list:
sub_modified_grid = copy.deepcopy(modified_grid)
if crosswordlib.crossword_set(sub_modified_grid, start_position, is_horizontal, word, number_coordinates):
clear_terminal()
for row in sub_modified_grid:
print(" ".join(row))
snooze()
process_crossword_clues(clue_list[1:], sub_modified_grid, number_coordinates)
def load_dict(filename):
with open(filename, "r") as file:
# Read the contents of the file
file_contents = file.read().strip()
# Split the contents into individual strings based on a delimiter (e.g., newlines)
return file_contents.split("\n")
w12 = load_dict("dictionary/12.txt")
w8 = load_dict("dictionary/8.txt")
w6 = load_dict("dictionary/6.txt")
w4 = load_dict("dictionary/4.txt")
{
"1": ["buzz", "hype", "rush"],
"8": ["rota", "roll"],
"9": ["embezzle", "skimming"],
"20": ["far-reaching"],
"22": ["muscular", "stalwart"],
"23": ["star", "icon"]
}
{
"2": ["myriads", "countless"],
"4": ["fierce", "untamed", "savage"],
"12": ["fall apart", "break down", "give way"],
"16": ["in a rush", "pronto"],
"18": ["brey", "cove", "inle"],
"19": ["elba", "lido"]
}
across = False
down = True
# length 8: 2 down, 9 across, 12 down
# length 6: 4 down
#print(w8)
#exit()
# I (A human) composed this solution path manually using the guesses suggested by ChatGPT. A more complete solution would automatically focus on the 'best' clue to work on dynamically.
solution_path = [
(["bareback"], 1, down),
(["tourniquet"], 5, down),
(["egyptology"], 10, down),
(["musthave"], 3, across),
(["blingbling"], 11, across),
# (["farreaching"], 20, across),
(["canopy"], 14, across),
(["floe"], 21, across),
(["square", "piazza"], 15, across),
(["rottweiler", "doberman"], 17, across),
(["axel", "lutz", "loop"], 6, down),
(["ease", "rest", "calm"], 7, down),
(["hearhear", "amentothat"], 13, down),
(["rota", "roll"], 8, across),
(w6, 4, down), # no good guesses for this, so use the full wordlist.
(w8, 9, across),
(w8, 2, down),
# (w8, 12, down),
(w6, 4, down),
(["brey", "cove", "inle"], 18, down),
(["elba", "lido"], 19, down),
(w8, 22, across),
]
process_crossword_clues(solution_path, original_grid, number_coordinates)
print("fail")
import cv2
import numpy as np
from PIL import Image
import pytesseract
import json
import os
import sys
import pyocr
import pyocr.builders
import numpy as np
tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
my_tool = None
for tool in tools:
print("* {}".format(tool.get_name()))
if "Cuneiform" in tool.get_name():
my_tool = tool
# The tools are returned in the recommended order of usage
if my_tool:
tool = my_tool
else:
tool = tools[0]
print("Will use tool '%s'" % (tool.get_name()))
# Ex: Will use tool 'libtesseract'
langs = tool.get_available_languages()
print("Available languages: %s" % ", ".join(langs))
lang = langs[0]
print("Will use lang '%s'" % (lang))
# Ex: Will use lang 'fra'
# Note that languages are NOT sorted in any way. Please refer
# to the system locale settings for the default language
# to use.
def add_border(image, border_size=2, border_color=(255, 255, 255)):
# Calculate the new size for the bordered image
new_width = image.width + 2 * border_size
new_height = image.height + 2 * border_size
# Create a new blank image with the calculated size and white background
bordered_image = Image.new("RGB", (new_width, new_height), border_color)
# Paste the original image onto the bordered image with an offset to create the border
bordered_image.paste(image, (border_size, border_size))
# Save the bordered image
return bordered_image #.save("bordered_image.png")
# doesnt help
def crop_sides_away(image):
image_array = np.array(image)
# Find the coordinates of the non-white pixels
non_white_coords = np.where(image_array != 255)
# Determine the cropping bounds
left = np.min(non_white_coords[1])
right = np.max(non_white_coords[1])
top = np.min(non_white_coords[0])
bottom = np.max(non_white_coords[0])
# Crop the image
cropped_image = image.crop((left, top, right + 1, bottom + 1))
#cropped_image.show()
return cropped_image
def process_cell(cell, save_path):
# Convert cell to grayscale
gray = cv2.cvtColor(cell, cv2.COLOR_BGR2GRAY)
# Apply adaptive thresholding to convert to black and white
_, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Calculate the percentage of white pixels in the cell
white_pixels = cv2.countNonZero(thresh)
total_pixels = cell.shape[0] * cell.shape[1]
white_percentage = (white_pixels / total_pixels) * 100
#print(white_percentage)
# Save cell image as PNG
cv2.imwrite(save_path, cell)
if white_percentage > 80:
return "#" # Mostly white
# elif white_percentage < 8:
# return " " # Mostly black
else:
try:
the_image = crop_sides_away(add_border(Image.fromarray(cell)))
text1 = tool.image_to_string(
the_image,
lang=lang,
builder=pyocr.builders.TextBuilder()
)
text2 = pytesseract.image_to_string(the_image,config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789').strip()
text = text2
print("OCR: tesseract:{} cuneiform:{}".format(text2, text1))
number = int(text)
return str(number) # Detected a number
# except (ValueError, CuneiformError):
except Exception as e:
#print(e)
return " " # Empty cell
def split_image_into_cells(image, rows, cols):
height, width = image.shape[:2]
cell_height = height // rows
cell_width = width // cols
cells = []
for r in range(rows):
this_row = []
for c in range(cols):
y1 = r * cell_height
y2 = (r + 1) * cell_height
x1 = c * cell_width
x2 = (c + 1) * cell_width
# cell = image[y1:y2, x1:x2]
delta = 4
cell = image[y1+delta:y2-2*delta, x1+delta:x2-2*delta]
this_row.append(cell)
cells.append(this_row)
return cells
def process_image(image_path, rows, cols):
image = cv2.imread(image_path)
# Split image into cells
cells = split_image_into_cells(image, rows, cols)
# Create output directory to save cell images
output_dir = 'cell_images'
os.makedirs(output_dir, exist_ok=True)
# Process each cell
grid = []
for i, this_row in enumerate(cells):
grid_row = []
for j, cell in enumerate(this_row):
cell_save_path = os.path.join(output_dir, f'cell_{i},{j}.png')
result = process_cell(cell, cell_save_path)
grid_row.append(result)
grid.append(grid_row)
# Convert grid to JSON
grid_json = json.dumps(grid)
# Write JSON to file
with open('output.json', 'w') as f:
f.write(grid_json)
print("Output saved to output.json")
# Example usage
image_path = 'crossword grid.png'
rows = 13
cols = 13
process_image(image_path, rows, cols)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment