Skip to content

Instantly share code, notes, and snippets.

@davisp
Last active August 23, 2021 20:49
Show Gist options
  • Save davisp/9fd282fcd9c2e4a1bcd8eab8d5feb228 to your computer and use it in GitHub Desktop.
Save davisp/9fd282fcd9c2e4a1bcd8eab8d5feb228 to your computer and use it in GitHub Desktop.
Solving Ruzzle™ with Machine Vision

Solving Ruzzle™ with Machine Vision

Have you ever found yourself being spanked repeatedly by your spouse's coworker at a free iOS game and thought to yourself, "You know, I could probably write a program to solve this...", but then realized you have better things to do with your time? This is that, except for the second half.

Due to a confluence of events, I've recently found myself in position of having a copious amount of free time. As such, when I had the initial thought of being petty enough to spend a few days writing a program just to beat a whipper snapper nearly fifteen years my junior, there was nothing in real life that prevented me from just laughing at the idea. So here we are. After a couple weeks of tinkering I've managed to write a fully automated solver for Ruzzle™.

Ruzzle™ - The Gameplay

For those not familiar, Ruzzle™ is simple word game that is a bit of a mashup of Boggle gameplay and Scrabble scoring.

A Ruzzle™ game board consisting of a 4x4 grid of letter with score modifiers

The basic gameplay involves finding words in a 4x4 matrix of random letters. By swiping to neighboring letters you create words. Neighbors are defined as all eight directions (horizontal, vertical, and both diagonals). Each letter can only be used once. Scoring is done by adding the letter values (i.e., each E is worth 1 point, where as each J is worth 10 points), applying any letter or word specific modifiers (i.e., double letter points or triple word points), and then adding a length bonus (5 * (len(word) - 4)). The sum of all words found in a round is the score for that round. There are three rounds with progressively more letter and word modifiers. The higher sum of scores from each three rounds determines the winner. Players get two minutes to search for words in each round.

Actually Solving Ruzzle™

At this point I'm sure that most everyone familiar with some basic computer science concepts has already guessed the general outline of how to go about solving such a problem. Obviously we'd create some sort of graph model of the game board and then use a basic search algorithm to find words, collecting a list of words that would then be displayed to the cheater user.

And that's basically exactly how I ended up solving Ruzzle™ boards with only two mildly interesting design decisions.

What Is a Word, Even?

The first interesting question is obviously, "What is a word?". Or more precisely, "What does Ruzzle™ think is a word?". I'm sure most programmers had the same thought as I did, just use /usr/share/dict/words and call it a day. Unfortunately, the words file is not at all well suited for using as a source for word game words. The main issue I found with words was that plurals and verb tenses are not consistently listed (i.e., run and ran are both present, where as race does not have the corresponding past tense raced).

The bottom line is that I did not end up with a perfect replica of whatever database of words that is used by Ruzzle™ but ended up close enough that I routinely get the top thirty or so words (ranked by score) listed as possible for a given game board.

My dictionary ended up being a combination of various SCOWL word lists. To be quite honest, I don't remember precisely which word lists I included. Theoretically I could go back and figure out exactly which ones are included but its not important enough to warrant the work given that its not an exact duplicate of the game's dictionary.

Suffice to say, I added increasingly rare lists until it felt like I was having too many false positives (i.e., words I thought were words, but that Ruzzle™ did not) without missing too many false negatives (i.e., words I did not find that Ruzzle™ thought were in fact, words).

How Do I Actually Get a Board Into the Program?

The second interesting design aspect was how to communicate a game board on my phone into my solver script. At this point I had probably spent about three hours creating a word dictionary and writing up the obvious depth-first search looking for words on games that I had already played. The next step was how to use the solver in real time.

My first approach here was to just type out the sixteen letters of the game board row by row and then parsing this into the internal 4x4 grid. Given this board:

A Ruzzle™ game board consisting of a 4x4 grid of letter with score modifiers

The resulting description would require typing SLANUEOSCPIDFHPE. This takes roughly on the order of two or three seconds of game time (not scientifically measured). However, it quickly became clear that ignoring letter and word scoring modifiers would not result in achieving truly high scores.

After a full thirty seconds of pondering I ended up using a prefix notation for score modifiers which seemed like it'd be the easiest thing to parse. This ended up with a board description for the previous example of: 2WSL2WANU3LE3LOS3WC2LPIDFHPE.

This meant that the parser was an extremely simple LL(1) parser at the expense of having a slightly more difficult to type board description. While testing this on some practice boards it vaguely seemed like a postfix modifier approach would be easier to translate at the cost of a slightly more complex parser. However, I was able to type out a complete board game in roughly four seconds of game time so I never got around to implementing this approach.

Finally, a Complete Ruzzle™ Solver!

At this point I had a working solver. The source is available from this stage for those that are curious on the actual implementation details. I played a few test games with this approach and managed to achieve some decent scores. But I had this nagging feeling that I could do better. I mean, this approach worked and all. But it somehow didn't seem to rise to the level of pettiness that I was really aiming for. This just felt like cheating so I had yet to actually use this against my arch-nemesis of word game players.

The textual input felt clunky. The ASCII art boards that showed how to form words was difficult to read reliably. It just felt like I could do better.

Doing Better and Wasting Even More Time

Implementing the core logic of the Ruzzle™ solver took roughly four hours from opening an editor to having ASCII art game boards spit out answers. This approach worked well enough. But what if, and hear me out on this, I could just show my phone to my webcam and get some graphical boards that showed how to swipe through the game board? That felt sufficiently petty to be generally amusing. And how hard could it be?

I happen to have experience from a previous life working with image analysis. I was vaguely aware that there were some pretty good open source optical-character recognition libraries based on random discussions throughout life. Surely, plugging all the bits together wouldn't take more than another four hours or so. (Narrator voice: He was only off by one order of magnitude).

OpenCV is Awesome and You Should Use It

Oh. My. Deity! OpenCV is one impressive piece of open source software. I don't even qualify that with "for academic open source software" which as anyone familiar knows, significantly lowers the bar.

Back in my day (yeah, I said it...) "open source image analysis" mostly consisted of various MATLAB scripts shared on blogs of graduate students and professors. However, my particular requirements involved decently large scale analyses (for the time at least). I was working with time series images (i.e., movies) with roughly hundreds to a thousand frames and tens of different fields of view. The professor I worked for had some fairly beefy servers (think multiple first generation multi-core Xeon chips in a single machine the size of a large fridge). All that is to say, if I needed a specific image algorithm, it was on me to implement it. If that sounds impressive, I guarantee you that it is not. It is 100% intensely annoying tedium.

With OpenCV almost all of that tedium disappears. Anything I ended up wanting to do ended up being baked into OpenCV. The only two reasons I hedge and say "almost" is that I did not have previous experience using NumPy. I would occasionally get tripped up looking for something in OpenCV when I should have been looking at NumPy. The second reason was due to me using the Python bindings for OpenCV and having a default approach to implementing various parts by hand. Iterating over individual pixels in Python is slow. Often I would write something quickly by hand to verify it worked and then spend just as long figuring out how to describe the same operation natively in OpenCV. In the end, the machine vision aspect of the solver works in nearly real time at around twenty frames per second.

Okay, Lets Get Started

Obviously, the first step in machine vision is getting a source of images to analyze. With OpenCV this ends up being laughably trivial:

#!/usr/bin/env python3

# Import OpenCV
import cv2

def main():
    # Connect to the default video capture device
    cap = cv2.VideoCapture(0)

    while True:
        # Capture a frame from the video capture device
        ret, frame = cap.read()

        # Display the frame as an image on screen
        cv2.imshow('Camera', frame)

        # Exit the program on any user input
        c = cv2.waitKey(1)
        if c >= 0:
            return

if __name__ == "__main__":
    main()

This short script connects to the default camera device and streams images to a live window. Pressing any key exits the program (you may have to focus the window displaying video first). Neato!

A Short Primer On Image Data

This section is a short aside on the basics of working with image data. For anyone that has no background in image processing the hope is that this is enough information to make sense of the rest of this document. If you, dear reader, are already familiar with terms like color space, thresholding, or gaussian blurs, feel free to skip this laughably short, overly simplified introductory material. For everyone else, here's a quick and incomplete primer.

An image in its most basic form is a two dimensional array of pixels. A pixel is represented as one or more integer values that describe some aspect of the pixel. One of the most basic (and common) concrete implementations of an image would be a N x M array of unsigned 8-bit values (i.e., values from 0 to 255). This is enough information to encode a gray scale (i.e., black and white) image.

Lets see what creating a basic image looks like:

#!/usr/bin/env python3

import cv2
import numpy as np

def main():
    # Create a 255x255 gray scale image
    img = np.zeros((255, 255), dtype="uint8")

    # Iterate over every pixel in the image. If it
    # feels odd that we iterate over y first, this
    # will be explained shortly
    for y in range(img.shape[0]):
        for x in range(img.shape[1]):
            # Set the "color" of the pixel to its horizontal
            # position. This will produce an image with all
            # black (value = 0) pixels in the left most column
            # progressing through dark gray, light gray, and
            # then white (value = 255) in the right most column.
            img[y][x] = x

    # Display the image
    cv2.imshow('Gray', img)

    # And wait for user input to exit the program
    while True:
        c = cv2.waitKey(1)
        if c >= 0:
            return

if __name__ == "__main__":
    main()

A horizontal gradient fading from black to white

Gray images are cool and all, but what if we want to work with color images? Well, we need some method to include color. The way we do this is by assigning three values to a pixel. One value represents the intensity of green light coming from the pixel, one value represents the intensity of red light, and a third value represents the intensity of blue light. For instance, here is a picture of my dog, Max, who I am current attempting to not run over with my office chair.

A picture of my dog, Max, a black Labrador Retriever

To illustrate working with color images we'll write a short program that draws a color square on this image.

#!/usr/bin/env python3

import cv2

def main():
    # Read a color image from disk
    img = cv2.imread("img-my-dog-max.png")

    # I know that img-my-dog-max.png is 806x605 pixels. This draws
    # a 150x150 green square towards the lower right of the image.
    for y in range(550, 600):
        for x in range(600, 750):
            # OpenCV uses a BGR color space which means this tuple
            # represents 0 blue, 255 green, and 0 red light
            # intensities.
            img[y][x] = (0, 255, 0)

    cv2.imshow('Max', img)

    # And wait for user input to exit the program
    while True:
        c = cv2.waitKey(1)
        if c >= 0:
            return

if __name__ == "__main__":
    main()

A picture of my dog, Max, a black Labrador Retriever with a green square added

For those following along, the comment about which tuple index corresponds to which color may be a bit confusing. Why is it (Blue, Green, Red) instead of (Green, Red, Blue) or the more familiar (Red, Green, Blue) (i.e., RGB)? It turns out this is a "Just Because" question. I'll admit I was slightly surprised like the author that OpenCV uses BGR. But only slightly. I happened to have experienced ages ago attempting to support cross-platform image displaying software that ran on both x86 machines as well as whatever chip was in the Power Mac G5. It turns out that porting image display algorithms to the Mac required swapping the blue and red channels due to the Mac using a big-endian chip instead of the more common little-endian x86 chips. So accepting that OpenCV was BGR was more of a shrug from my end rather than a source of major confusion. Just goes to show that our lived experiences inform even the most mundane details in software engineering.

A few last details on handling image data before we move on to the more interesting content.

First, the values associated with a pixel are commonly referred to as channels. The gray scale image example may also be referred to as a single channel image. The BGR color image example can be described as a three channel color image. The third most common variant is a four channel RGBA image where the A stands for alpha which determines the opacity of a given pixel.

Secondly, the coordinate axis for image data most commonly places the origin (0, 0) in the top left corner. The x-axis is the horizontal direction and the y-axis is the vertical direction. Without getting too detailed, this has to do with how image data is commonly laid out in RAM. While we think of images as two dimensional arrays, underneath they are actually a single dimensional array that can be efficiently passed between multiple layers of abstraction until they end up being physical light emitted from your monitor. An NxM image is not N arrays of length M, but a single array of length N * M. To access a specific pixel (in a single channel image) at coordinates (x, y) we index into the underlying array img[y * width + x]. This means (0, 0) is the first pixel in the image, and (N -1, M - 1) pixel is the last pixel. If we're working with a color or three channel image its only slightly more complicated in that we have to include the channel factor so that it would be img[y * width * num_channels + x * num_channels]. OpenCV and similar libraries hide this fact through syntactical sugar, however, we still tend to use y as the major access because the iteration then becomes a single linear pass through the array. If we were to iterate over the x access first as most people would default to what happens is we go from a linear pass to a roughly random jumping around the image since each successive pixel is roughly width bytes apart.

That's probably enough primer material on handling image data. I'll probably tear out and reword this sections and also add more later when I realize I forgot something important.

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment