rain-1/000-crossword-solving.md

## 000-crossword-solving.md

      
    Raw
  

              000-crossword-solving.md
            
          
    Solving crosswords with GPT

This is my research report. I've included a lot of the code and chat interactions for people to read through if interested. I worked on this crossword https://www.theguardian.com/crosswords/quick/16553
I had a vision for a GPT powered crossword solver. My idea is that it would do a tree search over GPT generated guesses that would include the knowns so far, like:

I didn't end up doing that because ChatGPT and GPT-4 are terrible at questions involving the length of words, or guessing words that contain specific letters at specific locations. It can sometimes do them but usually fails. I think this is because it's token based. I am curious whether a character based LLM would be better at such tasks.
Here's a couple examples of GPT-4 trying to guess from clues plus givens:


(Thanks to nat.dev for the GPT-4 access)
Here's a fun asciinema to watch of the partial crossword solver script: https://asciinema.org/a/oEURFRS5BfVHEKN9GjjW13HKe
OCRing a crossword puzzle

I started working on this without having realized that though. The first thing I did was generate a program to split a grid up into parse and turn that into a python list of lists. I tried to use tesseract OCR to recognize the numbers but it was absolutely terrible at that.  I switched over to Cuneiform and it also failed.

can you please make a program that splits an image into a row*col grid and process each cell individually. it should see if the cell is mostly white, mostly black, or apply tesseract OCR on it if it seems to have text in it. The output should be a json file representing the grid. " " for empty, "#" for black, number if it contained a number

This tool is actually great, it managed to convert the picture to:

but because I was having trouble with the OCR, I had it save every cell it split as an individual PNG for me to investigate. I tried tesseract on the individual cells and I improved performance by cropping the black border. So I asked ChatGPT to update the script to do this. And then again with expanding the cropped image with some white borders. Still I couldn't get 100% accuracy on the OCR.

Here's an example prompt I used:

add a 2 px white border to a from PIL import Image

Another I considered was cropping all the whitespace around the numbers away:

python PIL Image. i want to crop the white parts of the sides away

This gave a completely broken program. I didn't bother trying to reprompt or explain the bug to it, didn't pursue this idea.
Here are some OCR test cases:


And I tried LLaVa randomly, it didn't get it either:

Crossword Processing Lib

On the way I had a couple bash scripts generated for me, like write a bash script to split a dictionary file into a set of files 1.txt 2.txt 3.txt etc collecting up words of each length. This was very helpful.
I worked with ChatGPT to generate a small library of functions for dealing with a crossword grid in python. I made an example of a nested loop for trying out clues and backtracking on failue, I asked ChatGPT to transform it into a recursive procedure - it got very close, just a couple lines wrong.
Here is a prompt I used to generate the preprocessing function that extracts every clue number from our python list-of-lists crossword representation, building up a map of their coordinates:
python. i have a 2d array representing a crossword grid.

" " represents an empty cell, "#" represents a filled cells, some cells have numbers in them for the clues

want to loop through it and find each number, replace those with an empty cell, but also create a hash
table that maps numbers to row,col coordinates

...
[ChatGPT generated code omitted]
...

that's amazing. nice work. can you also measure how far across and down it is possible to go from each
number

...
[omitted]
...

write a crossword_get and crossword_set function that takes a number, a boolean representing down/across
and a word, it places the word into the grid.

refactor the get/set functions to make use of the number_coordinates 

Tree search

The idea of 'amplifying' ChatGPTs intelligence/capabilities by putting them into a tree search loop seems plausible but didn't pan out in this case. What actually ended up being most effective was just asking it for a few guesses and pasting the whole set of clues in. I asked it to transform that into a JSON object for me to stick into my program. I used a full wordlist for the unknowns (the ones where ChatGPTs guesses were rubbish or clearly didn't include the answer). And that program was able to solve about 80% of the crossword.
Then I just finished it manually with a little help from GPT-4.
Let me show the code I input to ChatGPT:
for word in words_len6:
    modified_grid = copy.deepcopy(original_grid)
    if crosswordlib.crossword_set(modified_grid, 3, False, word, number_coordinates):
        clear_terminal()
        print(":{}".format(word))
        for row in modified_grid:
            print(" ".join(row))
        snooze()

        for word in words_len4:
            sub_modified_grid = copy.deepcopy(modified_grid)
            if crosswordlib.crossword_set(sub_modified_grid, 2, True, word, number_coordinates):
                clear_terminal()
                for row in sub_modified_grid:
                    print(" ".join(row))
                snooze()
and then I asked it to transform that into a recursive function:

so this processes just 2 crossword clues. We want to generalize this to process any length.
Can you write function that takes in an array like [(words_len6, 3, False), (words_len4, 32, True)] and processes them. support any length of array, not just length 2.

Here is what it came up with:
import copy

def process_crossword_clues(clue_list, original_grid, number_coordinates):
    modified_grid = copy.deepcopy(original_grid)
    for clue in clue_list:
        words_list, start_position, is_horizontal = clue
        for word in words_list:
            sub_modified_grid = copy.deepcopy(modified_grid)
            if crosswordlib.crossword_set(sub_modified_grid, start_position, is_horizontal, word, number_coordinates):
                clear_terminal()
                for row in sub_modified_grid:
                    print(" ".join(row))
                snooze()
This was really only 2 lines off of the final working code that I used to use backtracking search to fill the clues in! Pretty impressive even though it wasn't perfect.
Conclusion

Overall this was a failure in terms of writing a fully automatic crossword solver. I basically solved the crossword manually, but using ChatGPT and GPT-4 to cheat. I don't know how I could automatically solve crosswords with this tool. Maybe a character based LLM could be used for my original idea - I don't know if there is one out there though.
It was also a failure in terms of OCR. I can't believe that I couldn't parse digits out of these little PNG images. Very disappointing.
The thing that I would say was a success in this project was pair programming with ChatGPT. It was very capable at understanding and implementing my instructions to its programming ability level - which is somewhat weak. It is fantastic that you can have it tweak and refactor code just by describing changes you want done. It can write some things correctly. Other things it gets wrong. You need to fix bugs in the code and help it along. But it is miraculous to me that it is able to do what it does. I have never seen any program synthesis tool anything like this before - and it's not just program synthesis, it's much more general.
I've included the code that we worked on together, ChatGPT wrote about 80% of it or more.

  
## 001-can-you-please.txt
User: can you try to solve this crossword please
ChatGPT:

Of course! I'll do my best to help you solve the crossword. Please provide me with the crossword clues and the length of the words, and I'll assist you in finding the answers.
User:
Across

    1
    Feeling of excitement (4)
    3
    Supposedly essential item to buy (4-4)
    8
    List of names (4)
    9
    Steals money for personal use (informal) (8)
    11
    Flash jewellery (5-5)
    14
    Layer of foliage created by high branches of forest trees (6)
    15
    Plaza (6)
    17
    German breed of powerful dog used to herd livestock (10)
    20
    Having wide scope (3-5)
    21
    Sheet of floating ice (4)
    22
    (Of a person) strongly built (8)
    23
    * (4)

Down

    1
    On a horse without a saddle (8)
    2
    Unspecified large numbers (8)
    4
    Wild (6)
    5
    Bandage that stops the flow of blood from an artery (10)
    6
    Figure skating jump (4)
    7
    Comfort (4)
    10
    Study of pharaohs and pyramids etc (10)
    12
    Collapse (4,4)
    13
    Expression of total agreement (4,4)
    16
    Asap! (2,4)
    18
    Bay and multiple estuary on the coast of East Anglia (4)
    19
    Island to which Napoleon was exiled in 1814 (4)

ChatGPT:

Here are the solutions to the crossword:

Across:

    Zest
    Must-have
    Rota
    Pilfers
    Bling-bling
    Canopy
    Square
    Rottweiler
    Wide-ranging
    Floe
    Strapping
    Aster

Down:

    Bareback
    Countless
    Untamed
    Tourniquet
    Axel
    Ease
    Egyptology
    Fall down
    You betcha
    Chop-chop
    Wash
    Elba

I hope this helps you complete the crossword! Let me know if you have any further questions.

User:
redo 1, B _ _ _
ChatGPT:

Apologies for the incorrect answer. Here's the revised answer:

    Buzz

User:
redo 9 _ _ _ _ _ E _ S
ChatGPT:

The answer for clue 9 across with the given letters "____E___S" is:

    Embezzles

User:
redo 6, _ _ Z _
ChatGPT:

The answer for clue 6 down with the pattern "_ _ Z _" is:

    Lutz

User:
redo 7 _ _ E _
ChatGPT:

The answer for clue 7 down with the pattern "_ _ E _" is:

    Ease

User:
3 / 3
redo 4 down, 5 letters.
pattern: U _ M _ L _
ChatGPT:

The answer for clue 4 down with the pattern "U _ M _ L _" is:

    Feral

User:
2 / 2
unspecied large numbers, 8 letters. pattern: _ _ T _ I _ N _
ChatGPT:

The answer for the clue "Unspecified large numbers" with the pattern "_ _ T _ I _ N _" is:

    Multitudinous

User:
3 / 3
guess a few possibilities for "Having wide scope" first word all, second word 5 letters
ChatGPT:

Here are a few possibilities for the clue "Having wide scope" with the first word starting with "all" and the second word having 5 letters:

    All-Encompassing
    All-Embracing
    All-Global
    All-Vast
    All-Wide

Please note that the actual answer would depend on the specific crossword puzzle and the intersecting letters.

## 02-crosswordlib.py
def process_crossword_grid(grid):
    num_to_coords = {}  # Hash table to map numbers to row,col coordinates
    across_lengths = {}  # Store lengths across from each number
    down_lengths = {}  # Store lengths down from each number

    for row in range(len(grid)):
        for col in range(len(grid[row])):
            cell = grid[row][col]
            if cell.isdigit():
                number = int(cell)
                num_to_coords[number] = (row, col)
                grid[row][col] = " "  # Replace the number with an empty cell

                # Measure the length across from the number
                across_length = 1
                while col + across_length < len(grid[row]) and grid[row][col + across_length] != "#":
                    across_length += 1
                across_lengths[number] = across_length

                # Measure the length down from the number
                down_length = 1
                while row + down_length < len(grid) and grid[row + down_length][col] != "#":
                    down_length += 1
                down_lengths[number] = down_length

    return num_to_coords, across_lengths, down_lengths, grid


def crossword_get(grid, number, is_down, number_coordinates):
    if number not in number_coordinates:
        return None

    start_row, start_col = number_coordinates[number]
    if is_down:
        word = ""
        row = start_row
        while row < len(grid) and grid[row][start_col] != "#":
            word += grid[row][start_col]
            row += 1
    else:
        word = ""
        col = start_col
        while col < len(grid[start_row]) and grid[start_row][col] != "#":
            word += grid[start_row][col]
            col += 1

    return word

def crossword_set(grid, number, is_down, word, number_coordinates):
    if number not in number_coordinates:
        return False

    start_row, start_col = number_coordinates[number]
    if is_down:
        if start_row + len(word) > len(grid):
            return False
        for i, char in enumerate(word):
            if grid[start_row + i][start_col] != " " and grid[start_row + i][start_col] != char:
                return False
            grid[start_row + i][start_col] = char
    else:
        if start_col + len(word) > len(grid[start_row]):
            return False
        for i, char in enumerate(word):
            if grid[start_row][start_col + i] != " " and grid[start_row][start_col + i] != char:
                return False
            grid[start_row][start_col + i] = char

    return True

def print_crossword(grid):
    for row in grid:
        print(" ".join(row))

## 03-realdeal.py

import crosswordlib
import time
import copy

def clear_terminal():
    print('\033c', end='')

def snooze():
    time.sleep(0.1)


crossword_grid = [
    ["1", " ", "2", " ", "#", "3", "4", " ", "5", " ", "6", " ", "7"],
    [" ", "#", " ", "#", "#", "#", " ", "#", " ", "#", " ", "#", " "],
    ["8", " ", " ", " ", "#", "9", " ", " ", " ", " ", " ", " ", " "],
    [" ", "#", " ", "#", "10", "#", " ", "#", " ", "#", " ", "#", " "],
    ["11", " ", " ", " ", " ", " ", " ", " ", " ", " ", "#", "#", "#"],
    [" ", "#", " ", "#", " ", "#", " ", "#", " ", "#", "12", "#", "13"],
    ["14", " ", " ", " ", " ", " ", "#", "15", " ", " ", " ", " ", " "],
    [" ", "#", " ", "#", " ", "#", "16", "#", " ", "#", " ", "#", " "],
    ["#", "#", "#", "17", " ", " ", " ", " ", " ", " ", " ", " ", " "],
    ["18", "#", "19", "#", " ", "#", " ", "#", " ", "#", " ", "#", " "],
    ["20", " ", " ", " ", " ", " ", " ", " ", "#", "21", " ", " ", " "],
    [" ", "#", " ", "#", " ", "#", " ", "#", "#", "#", " ", "#", " "],
    ["22", " ", " ", " ", " ", " ", " ", " ", "#", "23", " ", " ", " "]]

crosswordlib.print_crossword(crossword_grid)

number_coordinates, across_lengths, down_lengths, original_grid = crosswordlib.process_crossword_grid(crossword_grid)

print("Number coordinates:")
for number, coords in number_coordinates.items():
    print(f"Number {number}: {coords}")

print("\nAcross Lengths:")
for number, length in across_lengths.items():
    print(f"Number {number}: {length}")

print("\nDown Lengths:")
for number, length in down_lengths.items():
    print(f"Number {number}: {length}")


def process_crossword_clues(clue_list, original_grid, number_coordinates):
    modified_grid = copy.deepcopy(original_grid)

    if len(clue_list) == 0:
        # Win condition
        print('⭐')
        exit()
        return

    clue = clue_list[0]
    words_list, start_position, is_horizontal = clue

    for word in words_list:
        sub_modified_grid = copy.deepcopy(modified_grid)

        if crosswordlib.crossword_set(sub_modified_grid, start_position, is_horizontal, word, number_coordinates):
            clear_terminal()
            for row in sub_modified_grid:
                print(" ".join(row))
            snooze()

            process_crossword_clues(clue_list[1:], sub_modified_grid, number_coordinates)


def load_dict(filename):
    with open(filename, "r") as file:
        # Read the contents of the file
        file_contents = file.read().strip()

        # Split the contents into individual strings based on a delimiter (e.g., newlines)
        return file_contents.split("\n")

w12 = load_dict("dictionary/12.txt")
w8 = load_dict("dictionary/8.txt")
w6 = load_dict("dictionary/6.txt")
w4 = load_dict("dictionary/4.txt")

{
"1": ["buzz", "hype", "rush"],
"8": ["rota", "roll"],
"9": ["embezzle", "skimming"],
"20": ["far-reaching"],
"22": ["muscular", "stalwart"],
"23": ["star", "icon"]
}
{
"2": ["myriads", "countless"],
"4": ["fierce", "untamed", "savage"],
"12": ["fall apart", "break down", "give way"],
"16": ["in a rush", "pronto"],
"18": ["brey", "cove", "inle"],
"19": ["elba", "lido"]
}

across = False
down = True

# length 8: 2 down, 9 across, 12 down
# length 6: 4 down

#print(w8)
#exit()

# I (A human) composed this solution path manually using the guesses suggested by ChatGPT. A more complete solution would automatically focus on the 'best' clue to work on dynamically.
solution_path = [
    (["bareback"], 1, down),
    (["tourniquet"], 5, down),
    (["egyptology"], 10, down),
    (["musthave"], 3, across),
    (["blingbling"], 11, across),
#    (["farreaching"], 20, across),
    (["canopy"], 14, across),
    (["floe"], 21, across),
    (["square", "piazza"], 15, across),
    (["rottweiler", "doberman"], 17, across),
    (["axel", "lutz", "loop"], 6, down),
    (["ease", "rest", "calm"], 7, down),
    (["hearhear", "amentothat"], 13, down),

    (["rota", "roll"], 8, across),

    (w6, 4, down), # no good guesses for this, so use the full wordlist.
    (w8, 9, across),
    (w8, 2, down),
#    (w8, 12, down),
    (w6, 4, down),

    (["brey", "cove", "inle"], 18, down),
    (["elba", "lido"], 19, down),
    (w8, 22, across),
]

process_crossword_clues(solution_path, original_grid, number_coordinates)
print("fail")

## 04-ocr-tool.py
import cv2
import numpy as np
from PIL import Image
import pytesseract
import json
import os
import sys
import pyocr
import pyocr.builders
import numpy as np

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
my_tool = None
for tool in tools:
    print("* {}".format(tool.get_name()))
    if "Cuneiform" in tool.get_name():
        my_tool = tool
# The tools are returned in the recommended order of usage
if my_tool:
    tool = my_tool
else:
    tool = tools[0]
print("Will use tool '%s'" % (tool.get_name()))
# Ex: Will use tool 'libtesseract'

langs = tool.get_available_languages()
print("Available languages: %s" % ", ".join(langs))
lang = langs[0]
print("Will use lang '%s'" % (lang))
# Ex: Will use lang 'fra'
# Note that languages are NOT sorted in any way. Please refer
# to the system locale settings for the default language
# to use.

def add_border(image, border_size=2, border_color=(255, 255, 255)):
    # Calculate the new size for the bordered image
    new_width = image.width + 2 * border_size
    new_height = image.height + 2 * border_size

    # Create a new blank image with the calculated size and white background
    bordered_image = Image.new("RGB", (new_width, new_height), border_color)

    # Paste the original image onto the bordered image with an offset to create the border
    bordered_image.paste(image, (border_size, border_size))

    # Save the bordered image
    return bordered_image #.save("bordered_image.png")

# doesnt help
def crop_sides_away(image):
    image_array = np.array(image)

    # Find the coordinates of the non-white pixels
    non_white_coords = np.where(image_array != 255)

    # Determine the cropping bounds
    left = np.min(non_white_coords[1])
    right = np.max(non_white_coords[1])
    top = np.min(non_white_coords[0])
    bottom = np.max(non_white_coords[0])

    # Crop the image
    cropped_image = image.crop((left, top, right + 1, bottom + 1))
    #cropped_image.show()

    return cropped_image

def process_cell(cell, save_path):
    # Convert cell to grayscale
    gray = cv2.cvtColor(cell, cv2.COLOR_BGR2GRAY)

    # Apply adaptive thresholding to convert to black and white
    _, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

    # Calculate the percentage of white pixels in the cell
    white_pixels = cv2.countNonZero(thresh)
    total_pixels = cell.shape[0] * cell.shape[1]
    white_percentage = (white_pixels / total_pixels) * 100

    #print(white_percentage)

    # Save cell image as PNG
    cv2.imwrite(save_path, cell)

    if white_percentage > 80:
        return "#"  # Mostly white
#    elif white_percentage < 8:
#        return " "  # Mostly black
    else:
        try:
            the_image = crop_sides_away(add_border(Image.fromarray(cell)))
            text1 = tool.image_to_string(
                the_image,
                lang=lang,
                builder=pyocr.builders.TextBuilder()
            )
            text2 = pytesseract.image_to_string(the_image,config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789').strip()
            text = text2
            print("OCR: tesseract:{} cuneiform:{}".format(text2, text1))
            number = int(text)
            return str(number)  # Detected a number
#        except (ValueError, CuneiformError):
        except Exception as e:
            #print(e)
            return " "  # Empty cell

def split_image_into_cells(image, rows, cols):
    height, width = image.shape[:2]
    cell_height = height // rows
    cell_width = width // cols

    cells = []
    for r in range(rows):
        this_row = []
        for c in range(cols):
            y1 = r * cell_height
            y2 = (r + 1) * cell_height
            x1 = c * cell_width
            x2 = (c + 1) * cell_width
#            cell = image[y1:y2, x1:x2]
            delta = 4
            cell = image[y1+delta:y2-2*delta, x1+delta:x2-2*delta]
            this_row.append(cell)
        cells.append(this_row)

    return cells

def process_image(image_path, rows, cols):
    image = cv2.imread(image_path)

    # Split image into cells
    cells = split_image_into_cells(image, rows, cols)

    # Create output directory to save cell images
    output_dir = 'cell_images'
    os.makedirs(output_dir, exist_ok=True)

    # Process each cell
    grid = []
    for i, this_row in enumerate(cells):
        grid_row = []
        for j, cell in enumerate(this_row):
            cell_save_path = os.path.join(output_dir, f'cell_{i},{j}.png')
            result = process_cell(cell, cell_save_path)
            grid_row.append(result)
        grid.append(grid_row)

    # Convert grid to JSON
    grid_json = json.dumps(grid)

    # Write JSON to file
    with open('output.json', 'w') as f:
        f.write(grid_json)

    print("Output saved to output.json")

# Example usage
image_path = 'crossword grid.png'
rows = 13
cols = 13
process_image(image_path, rows, cols)
	User: can you try to solve this crossword please
	ChatGPT:

	Of course! I'll do my best to help you solve the crossword. Please provide me with the crossword clues and the length of the words, and I'll assist you in finding the answers.
	User:
	Across

	1
	Feeling of excitement (4)
	3
	Supposedly essential item to buy (4-4)
	8
	List of names (4)
	9
	Steals money for personal use (informal) (8)
	11
	Flash jewellery (5-5)
	14
	Layer of foliage created by high branches of forest trees (6)
	15
	Plaza (6)
	17
	German breed of powerful dog used to herd livestock (10)
	20
	Having wide scope (3-5)
	21
	Sheet of floating ice (4)
	22
	(Of a person) strongly built (8)
	23
	* (4)

	Down

	1
	On a horse without a saddle (8)
	2
	Unspecified large numbers (8)
	4
	Wild (6)
	5
	Bandage that stops the flow of blood from an artery (10)
	6
	Figure skating jump (4)
	7
	Comfort (4)
	10
	Study of pharaohs and pyramids etc (10)
	12
	Collapse (4,4)
	13
	Expression of total agreement (4,4)
	16
	Asap! (2,4)
	18
	Bay and multiple estuary on the coast of East Anglia (4)
	19
	Island to which Napoleon was exiled in 1814 (4)

	ChatGPT:

	Here are the solutions to the crossword:

	Across:

	Zest
	Must-have
	Rota
	Pilfers
	Bling-bling
	Canopy
	Square
	Rottweiler
	Wide-ranging
	Floe
	Strapping
	Aster

	Down:

	Bareback
	Countless
	Untamed
	Tourniquet
	Axel
	Ease
	Egyptology
	Fall down
	You betcha
	Chop-chop
	Wash
	Elba

	I hope this helps you complete the crossword! Let me know if you have any further questions.

	User:
	redo 1, B _ _ _
	ChatGPT:

	Apologies for the incorrect answer. Here's the revised answer:

	Buzz

	User:
	redo 9 _ _ _ _ _ E _ S
	ChatGPT:

	The answer for clue 9 across with the given letters "____E___S" is:

	Embezzles

	User:
	redo 6, _ _ Z _
	ChatGPT:

	The answer for clue 6 down with the pattern "_ _ Z _" is:

	Lutz

	User:
	redo 7 _ _ E _
	ChatGPT:

	The answer for clue 7 down with the pattern "_ _ E _" is:

	Ease

	User:
	3 / 3
	redo 4 down, 5 letters.
	pattern: U _ M _ L _
	ChatGPT:

	The answer for clue 4 down with the pattern "U _ M _ L _" is:

	Feral

	User:
	2 / 2
	unspecied large numbers, 8 letters. pattern: _ _ T _ I _ N _
	ChatGPT:

	The answer for the clue "Unspecified large numbers" with the pattern "_ _ T _ I _ N _" is:

	Multitudinous

	User:
	3 / 3
	guess a few possibilities for "Having wide scope" first word all, second word 5 letters
	ChatGPT:

	Here are a few possibilities for the clue "Having wide scope" with the first word starting with "all" and the second word having 5 letters:

	All-Encompassing
	All-Embracing
	All-Global
	All-Vast
	All-Wide

	Please note that the actual answer would depend on the specific crossword puzzle and the intersecting letters.
	def process_crossword_grid(grid):
	num_to_coords = {} # Hash table to map numbers to row,col coordinates
	across_lengths = {} # Store lengths across from each number
	down_lengths = {} # Store lengths down from each number

	for row in range(len(grid)):
	for col in range(len(grid[row])):
	cell = grid[row][col]
	if cell.isdigit():
	number = int(cell)
	num_to_coords[number] = (row, col)
	grid[row][col] = " " # Replace the number with an empty cell

	# Measure the length across from the number
	across_length = 1
	while col + across_length < len(grid[row]) and grid[row][col + across_length] != "#":
	across_length += 1
	across_lengths[number] = across_length

	# Measure the length down from the number
	down_length = 1
	while row + down_length < len(grid) and grid[row + down_length][col] != "#":
	down_length += 1
	down_lengths[number] = down_length

	return num_to_coords, across_lengths, down_lengths, grid


	def crossword_get(grid, number, is_down, number_coordinates):
	if number not in number_coordinates:
	return None

	start_row, start_col = number_coordinates[number]
	if is_down:
	word = ""
	row = start_row
	while row < len(grid) and grid[row][start_col] != "#":
	word += grid[row][start_col]
	row += 1
	else:
	word = ""
	col = start_col
	while col < len(grid[start_row]) and grid[start_row][col] != "#":
	word += grid[start_row][col]
	col += 1

	return word

	def crossword_set(grid, number, is_down, word, number_coordinates):
	if number not in number_coordinates:
	return False

	start_row, start_col = number_coordinates[number]
	if is_down:
	if start_row + len(word) > len(grid):
	return False
	for i, char in enumerate(word):
	if grid[start_row + i][start_col] != " " and grid[start_row + i][start_col] != char:
	return False
	grid[start_row + i][start_col] = char
	else:
	if start_col + len(word) > len(grid[start_row]):
	return False
	for i, char in enumerate(word):
	if grid[start_row][start_col + i] != " " and grid[start_row][start_col + i] != char:
	return False
	grid[start_row][start_col + i] = char

	return True

	def print_crossword(grid):
	for row in grid:
	print(" ".join(row))

	import crosswordlib
	import time
	import copy

	def clear_terminal():
	print('\033c', end='')

	def snooze():
	time.sleep(0.1)


	crossword_grid = [
	["1", " ", "2", " ", "#", "3", "4", " ", "5", " ", "6", " ", "7"],
	[" ", "#", " ", "#", "#", "#", " ", "#", " ", "#", " ", "#", " "],
	["8", " ", " ", " ", "#", "9", " ", " ", " ", " ", " ", " ", " "],
	[" ", "#", " ", "#", "10", "#", " ", "#", " ", "#", " ", "#", " "],
	["11", " ", " ", " ", " ", " ", " ", " ", " ", " ", "#", "#", "#"],
	[" ", "#", " ", "#", " ", "#", " ", "#", " ", "#", "12", "#", "13"],
	["14", " ", " ", " ", " ", " ", "#", "15", " ", " ", " ", " ", " "],
	[" ", "#", " ", "#", " ", "#", "16", "#", " ", "#", " ", "#", " "],
	["#", "#", "#", "17", " ", " ", " ", " ", " ", " ", " ", " ", " "],
	["18", "#", "19", "#", " ", "#", " ", "#", " ", "#", " ", "#", " "],
	["20", " ", " ", " ", " ", " ", " ", " ", "#", "21", " ", " ", " "],
	[" ", "#", " ", "#", " ", "#", " ", "#", "#", "#", " ", "#", " "],
	["22", " ", " ", " ", " ", " ", " ", " ", "#", "23", " ", " ", " "]]

	crosswordlib.print_crossword(crossword_grid)

	number_coordinates, across_lengths, down_lengths, original_grid = crosswordlib.process_crossword_grid(crossword_grid)

	print("Number coordinates:")
	for number, coords in number_coordinates.items():
	print(f"Number {number}: {coords}")

	print("\nAcross Lengths:")
	for number, length in across_lengths.items():
	print(f"Number {number}: {length}")

	print("\nDown Lengths:")
	for number, length in down_lengths.items():
	print(f"Number {number}: {length}")


	def process_crossword_clues(clue_list, original_grid, number_coordinates):
	modified_grid = copy.deepcopy(original_grid)

	if len(clue_list) == 0:
	# Win condition
	print('⭐')
	exit()
	return

	clue = clue_list[0]
	words_list, start_position, is_horizontal = clue

	for word in words_list:
	sub_modified_grid = copy.deepcopy(modified_grid)

	if crosswordlib.crossword_set(sub_modified_grid, start_position, is_horizontal, word, number_coordinates):
	clear_terminal()
	for row in sub_modified_grid:
	print(" ".join(row))
	snooze()

	process_crossword_clues(clue_list[1:], sub_modified_grid, number_coordinates)


	def load_dict(filename):
	with open(filename, "r") as file:
	# Read the contents of the file
	file_contents = file.read().strip()

	# Split the contents into individual strings based on a delimiter (e.g., newlines)
	return file_contents.split("\n")

	w12 = load_dict("dictionary/12.txt")
	w8 = load_dict("dictionary/8.txt")
	w6 = load_dict("dictionary/6.txt")
	w4 = load_dict("dictionary/4.txt")

	{
	"1": ["buzz", "hype", "rush"],
	"8": ["rota", "roll"],
	"9": ["embezzle", "skimming"],
	"20": ["far-reaching"],
	"22": ["muscular", "stalwart"],
	"23": ["star", "icon"]
	}
	{
	"2": ["myriads", "countless"],
	"4": ["fierce", "untamed", "savage"],
	"12": ["fall apart", "break down", "give way"],
	"16": ["in a rush", "pronto"],
	"18": ["brey", "cove", "inle"],
	"19": ["elba", "lido"]
	}

	across = False
	down = True

	# length 8: 2 down, 9 across, 12 down
	# length 6: 4 down

	#print(w8)
	#exit()

	# I (A human) composed this solution path manually using the guesses suggested by ChatGPT. A more complete solution would automatically focus on the 'best' clue to work on dynamically.
	solution_path = [
	(["bareback"], 1, down),
	(["tourniquet"], 5, down),
	(["egyptology"], 10, down),
	(["musthave"], 3, across),
	(["blingbling"], 11, across),
	# (["farreaching"], 20, across),
	(["canopy"], 14, across),
	(["floe"], 21, across),
	(["square", "piazza"], 15, across),
	(["rottweiler", "doberman"], 17, across),
	(["axel", "lutz", "loop"], 6, down),
	(["ease", "rest", "calm"], 7, down),
	(["hearhear", "amentothat"], 13, down),

	(["rota", "roll"], 8, across),

	(w6, 4, down), # no good guesses for this, so use the full wordlist.
	(w8, 9, across),
	(w8, 2, down),
	# (w8, 12, down),
	(w6, 4, down),

	(["brey", "cove", "inle"], 18, down),
	(["elba", "lido"], 19, down),
	(w8, 22, across),
	]

	process_crossword_clues(solution_path, original_grid, number_coordinates)
	print("fail")
	import cv2
	import numpy as np
	from PIL import Image
	import pytesseract
	import json
	import os
	import sys
	import pyocr
	import pyocr.builders
	import numpy as np

	tools = pyocr.get_available_tools()
	if len(tools) == 0:
	print("No OCR tool found")
	sys.exit(1)
	my_tool = None
	for tool in tools:
	print("* {}".format(tool.get_name()))
	if "Cuneiform" in tool.get_name():
	my_tool = tool
	# The tools are returned in the recommended order of usage
	if my_tool:
	tool = my_tool
	else:
	tool = tools[0]
	print("Will use tool '%s'" % (tool.get_name()))
	# Ex: Will use tool 'libtesseract'

	langs = tool.get_available_languages()
	print("Available languages: %s" % ", ".join(langs))
	lang = langs[0]
	print("Will use lang '%s'" % (lang))
	# Ex: Will use lang 'fra'
	# Note that languages are NOT sorted in any way. Please refer
	# to the system locale settings for the default language
	# to use.

	def add_border(image, border_size=2, border_color=(255, 255, 255)):
	# Calculate the new size for the bordered image
	new_width = image.width + 2 * border_size
	new_height = image.height + 2 * border_size

	# Create a new blank image with the calculated size and white background
	bordered_image = Image.new("RGB", (new_width, new_height), border_color)

	# Paste the original image onto the bordered image with an offset to create the border
	bordered_image.paste(image, (border_size, border_size))

	# Save the bordered image
	return bordered_image #.save("bordered_image.png")

	# doesnt help
	def crop_sides_away(image):
	image_array = np.array(image)

	# Find the coordinates of the non-white pixels
	non_white_coords = np.where(image_array != 255)

	# Determine the cropping bounds
	left = np.min(non_white_coords[1])
	right = np.max(non_white_coords[1])
	top = np.min(non_white_coords[0])
	bottom = np.max(non_white_coords[0])

	# Crop the image
	cropped_image = image.crop((left, top, right + 1, bottom + 1))
	#cropped_image.show()

	return cropped_image

	def process_cell(cell, save_path):
	# Convert cell to grayscale
	gray = cv2.cvtColor(cell, cv2.COLOR_BGR2GRAY)

	# Apply adaptive thresholding to convert to black and white
	_, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

	# Calculate the percentage of white pixels in the cell
	white_pixels = cv2.countNonZero(thresh)
	total_pixels = cell.shape[0] * cell.shape[1]
	white_percentage = (white_pixels / total_pixels) * 100

	#print(white_percentage)

	# Save cell image as PNG
	cv2.imwrite(save_path, cell)

	if white_percentage > 80:
	return "#" # Mostly white
	# elif white_percentage < 8:
	# return " " # Mostly black
	else:
	try:
	the_image = crop_sides_away(add_border(Image.fromarray(cell)))
	text1 = tool.image_to_string(
	the_image,
	lang=lang,
	builder=pyocr.builders.TextBuilder()
	)
	text2 = pytesseract.image_to_string(the_image,config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789').strip()
	text = text2
	print("OCR: tesseract:{} cuneiform:{}".format(text2, text1))
	number = int(text)
	return str(number) # Detected a number
	# except (ValueError, CuneiformError):
	except Exception as e:
	#print(e)
	return " " # Empty cell

	def split_image_into_cells(image, rows, cols):
	height, width = image.shape[:2]
	cell_height = height // rows
	cell_width = width // cols

	cells = []
	for r in range(rows):
	this_row = []
	for c in range(cols):
	y1 = r * cell_height
	y2 = (r + 1) * cell_height
	x1 = c * cell_width
	x2 = (c + 1) * cell_width
	# cell = image[y1:y2, x1:x2]
	delta = 4
	cell = image[y1+delta:y2-2delta, x1+delta:x2-2delta]
	this_row.append(cell)
	cells.append(this_row)

	return cells

	def process_image(image_path, rows, cols):
	image = cv2.imread(image_path)

	# Split image into cells
	cells = split_image_into_cells(image, rows, cols)

	# Create output directory to save cell images
	output_dir = 'cell_images'
	os.makedirs(output_dir, exist_ok=True)

	# Process each cell
	grid = []
	for i, this_row in enumerate(cells):
	grid_row = []
	for j, cell in enumerate(this_row):
	cell_save_path = os.path.join(output_dir, f'cell_{i},{j}.png')
	result = process_cell(cell, cell_save_path)
	grid_row.append(result)
	grid.append(grid_row)

	# Convert grid to JSON
	grid_json = json.dumps(grid)

	# Write JSON to file
	with open('output.json', 'w') as f:
	f.write(grid_json)

	print("Output saved to output.json")

	# Example usage
	image_path = 'crossword grid.png'
	rows = 13
	cols = 13
	process_image(image_path, rows, cols)