Skip to content

Instantly share code, notes, and snippets.

@palimondo
Last active January 5, 2024 12:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save palimondo/c0e8e78dc08ed1516a4f1c34f4f4e487 to your computer and use it in GitHub Desktop.
Save palimondo/c0e8e78dc08ed1516a4f1c34f4f4e487 to your computer and use it in GitHub Desktop.
ChatGPT can analyze the secret message from the GEMINI demo with a few hints from the user (https://developers.googleblog.com/2023/12/how-its-made-gemini-multimodal-prompting.html)

ChatGPT4 Decoding the secret message from GEMINI demo

Below is the transcript of my conversation with ChatGPT4 (4.5 Turbo). I had to manually copy the responses into this gist, because Share lint to Chat gave me error: "Sharing conversations with images is not yet supported".

I have copied the image with secret message from Google's write-up How it’s Made: Interacting with Gemini through multimodal prompting.

User

Uploaded image This is a secret message. What does it say?

ChatGPT

The image you've provided appears to be a series of hand signs, which could be a form of sign language or a coded message. To decipher the message, one would typically match each hand gesture to its corresponding letter or concept, depending on the specific sign language or code being used.

For instance, if this is American Sign Language (ASL), each hand position corresponds to a letter in the ASL alphabet. However, without specific knowledge of the rules governing the sequence of signs or the context in which they're used, it can be challenging to interpret the exact message.

If you are looking for an interpretation based on ASL or another form of sign language, or if this is a specific code or cipher you need assistance with, please provide more details or context.

User

When I look at the image I see a 8x6 grid of photos showing 2 pictures of the hand over a wooden table. First is a fist, the second is an open hand from the top with all five fingers spread out. This does not look like a sign language to me.

ChatGPT

The pattern you're describing—an alternating sequence of a closed fist and an open hand with fingers spread—does not correspond to any conventional sign language or common means of communication. It could represent a binary code where the fist stands for 0 and the open hand stands for 1, or it may be a visual pattern intended to convey a message through its arrangement.

To interpret it as binary, we could assign values to the fist and open hand and convert the sequence into letters or numbers. Since there are 48 images (8 rows of 6), this could potentially represent six 8-bit binary bytes, which can then be translated into characters if we assume it's ASCII code. Let’s try to decode it under that assumption.

Considering the top left as the start, a fist is '0' and an open hand is '1', let's convert each row into a binary number and then translate that into a character. I'll start by extracting the pattern from the image and then proceed with the decoding.

The binary pattern extracted from the image is as follows:

000000
111111
011111
000000
011101
111111
011111
000000

Each row of six bits can be translated into a character using ASCII encoding. The first and fourth rows are all zeros, which would translate to the null character in ASCII, and similarly, the second and sixth rows are all ones, which would typically translate to the character with ASCII code 63, which is the '?' character. However, since ASCII characters are 7 or 8 bits and we have only 6 here, we might be missing some bits or this might not be ASCII encoded after all.

Let's attempt to translate the remaining rows into ASCII characters by adding leading zeros to make them 8 bits long (which is standard for ASCII characters) and see if that yields any meaningful output.

The binary pattern from the image, when padded with leading zeros to form 8-bit ASCII codes, translates to a series of question marks (????????). This suggests that the pattern does not translate into readable ASCII characters, which means the pattern may not be intended to be read as binary-to-ASCII.

It could be that this pattern represents something else, or the interpretation approach might need to be different. If this pattern is meant to convey a message, it might be using a different coding system or it could be symbolic rather than literal.

If there is additional context or another method of interpretation you suspect this pattern may follow, please provide further details.

User

The image is 8x6, meaning there are 8 columns and 6 rows. Try again.

ChatGPT

I apologize for the oversight. Let's analyze the image again, this time considering 8 columns and 6 rows. This means we have 8 bits in each row which can represent a byte, potentially translating to an ASCII character. We'll correct the interpretation and proceed with the conversion.

Error analyzing (Copied from the expanded explanation block)

# Correct the interpretation: 8 columns (8 bits per row) and 6 rows
# We need to invert the rows and columns because we are interpreting each row as a byte (8 bits)

# Transpose the binary pattern to correct the orientation
transposed_binary_pattern = [''.join(row[col] for row in binary_pattern) for col in range(8)]

# Convert the transposed binary pattern to ASCII
ascii_message_corrected = binary_to_ascii(transposed_binary_pattern)
ascii_message_corrected

It seems there was an error in the transposition process due to an incorrect range. I'll correct this and reinterpret the binary pattern, treating each of the 6 rows as a string of 8 binary digits to convert to ASCII characters. Let's try that again.

The corrected binary pattern from the image translates to "Dnnnfn" in ASCII characters. This does not form a readable word or phrase in English, which suggests that the pattern might not represent ASCII-encoded text, or it may require additional context to understand its meaning. If this pattern is meant to encode a message, it might be using a different system or require a key to decipher.

(Copied from the expanded explanation block)

# Since we have 6 rows and 8 columns, let's reformat the binary pattern list to represent each row as a binary byte
binary_pattern_rows = [''.join(bits) for bits in zip(*binary_pattern)]

# Now convert the rows of binary bytes to ASCII
ascii_message_rows = binary_to_ascii(binary_pattern_rows)
ascii_message_rows

(Result 'Dnnnfn')

User

Try again directly from the image, because the bits transcription was also incorrect previously.

ChatGPT

The correct interpretation of the binary pattern from the image translates to the word "GEMINI". This suggests that the message hidden in the pattern of fists and open hands is representing the word "GEMINI", which could refer to the astrological sign, a constellation, a project name, or any other association with the term.

(Copied from the expanded explanation block):

# We will now analyze the grid considering it as 6 rows and 8 columns (not 8x6 as before).
# Each row will represent a byte and we will convert this binary data to characters.

# Define a function to convert the array into a binary string based on brightness
def get_binary_from_image(image_array, threshold, cols, rows):
    # Determine the size of each grid cell
    cell_width = image_array.shape[1] // cols
    cell_height = image_array.shape[0] // rows

    # Initialize the binary string
    binary_string = ''

    # Iterate over each row and column to get the binary string
    for row in range(rows):
        for col in range(cols):
            # Calculate the cell's bounding box
            left = col * cell_width
            top = row * cell_height
            right = (col + 1) * cell_width
            bottom = (row + 1) * cell_height

            # Crop the cell from the image
            cell = image_array[top:bottom, left:right]

            # Determine if the cell is a fist ('0') or an open hand ('1')
            if cell.mean() < threshold:
                binary_string += '0'
            else:
                binary_string += '1'

    return binary_string

# Get the binary string from the image
binary_string = get_binary_from_image(img_array, threshold, 8, 6)

# Split the binary string into a list of 8-bit (1 byte) binary numbers
binary_bytes = [binary_string[i:i+8] for i in range(0, len(binary_string), 8)]

# Convert binary bytes to ASCII characters
decoded_message = ''.join([chr(int(b, 2)) for b in binary_bytes if all(c in '01' for c in b)])
decoded_message

User

Show your work

ChatGPT

Certainly. The process to decode the message involved the following steps:

  1. Image Analysis:

    • The uploaded image was processed to determine the average brightness in order to set a threshold that differentiates between a closed fist and an open hand. The threshold was set to the mean value of the grayscale representation of the image.
  2. Grid Division:

    • The image was divided into a grid of 6 rows and 8 columns, as per your correction. Each cell in this grid corresponds to a single bit.
  3. Binary Pattern Extraction:

    • Each cell was analyzed to determine if it was a fist or an open hand. If the cell's average brightness was below the threshold, it was considered a '0' (fist), and if it was above, it was considered a '1' (open hand).
  4. Binary String Creation:

    • A binary string was constructed by concatenating the 0s and 1s for each cell in the order they appear in the grid from left to right, top to bottom.
  5. Conversion to ASCII:

    • The binary string was split into 8-bit segments, each corresponding to one byte. Each byte was then converted to its equivalent ASCII character.
  6. Decoding the Message:

    • The sequence of ASCII characters formed the word "GEMINI".

Here is the binary pattern as it was determined from the image, with each 8-bit segment corresponding to a row:

01000111  // G
01000101  // E
01001101  // M
01001001  // I
01001110  // N
01001001  // I

When each binary byte is converted to ASCII:

  • 01000111 translates to G
  • 01000101 translates to E
  • 01001101 translates to M
  • 01001001 translates to I
  • 01001110 translates to N
  • 01001001 translates to I

Putting them together forms the word "GEMINI".

User

What library was used to open the Image?

ChatGPT

The library used to open and manipulate the image is the Python Imaging Library (PIL), specifically its Image module. This library is often installed via the Pillow package, which is an updated and user-friendly fork of PIL that supports opening, manipulating, and saving many different image file formats. In the code, the Image module is used to open the image and convert it to grayscale for analysis.

User

Can you give me the full source code so that I can try running the image analysis myself locally ?

ChatGPT

Certainly! Here's the full source code for the image analysis. You'll need to have the Pillow package installed (pip install Pillow) to run this code:

from PIL import Image
import numpy as np

# Load the image from file
image_path = 'path_to_your_image.png'  # Replace with your image file path
img = Image.open(image_path)

# Convert image to grayscale and then to numpy array
img_gray = img.convert('L')
img_array = np.array(img_gray)

# Define the threshold to distinguish between a closed fist (darker) and an open hand (lighter)
# Assuming the fist will have less white (lower pixel values) than the open hand
threshold = img_array.mean()

# Function to convert the array into a binary string based on brightness
def get_binary_from_image(image_array, threshold, cols, rows):
    # Determine the size of each grid cell
    cell_width = image_array.shape[1] // cols
    cell_height = image_array.shape[0] // rows

    # Initialize the binary string
    binary_string = ''

    # Iterate over each row and column to get the binary string
    for row in range(rows):
        for col in range(cols):
            # Calculate the cell's bounding box
            left = col * cell_width
            top = row * cell_height
            right = (col + 1) * cell_width
            bottom = (row + 1) * cell_height

            # Crop the cell from the image
            cell = image_array[top:bottom, left:right]

            # Determine if the cell is a fist ('0') or an open hand ('1')
            if cell.mean() < threshold:
                binary_string += '0'
            else:
                binary_string += '1'

    return binary_string

# Get the binary string from the image
binary_string = get_binary_from_image(img_array, threshold, 8, 6)

# Split the binary string into a list of 8-bit (1 byte) binary numbers
binary_bytes = [binary_string[i:i+8] for i in range(0, len(binary_string), 8)]

# Convert binary bytes to ASCII characters
decoded_message = ''.join([chr(int(b, 2)) for b in binary_bytes if all(c in '01' for c in b)])
print(decoded_message)

Make sure you replace 'path_to_your_image.png' with the actual path to the image file you want to analyze. This script will print the decoded message to the console after running it.

User

how should I name this script?

ChatGPT

You can name the script something descriptive, such as image_to_text_decoder.py. This name indicates that the script is intended to decode text from an image, which is exactly what the code is designed to do. Naming scripts in a way that describes their function can help you and others understand at a glance what the script is for, especially when you come back to it after some time or when the script is part of a larger project.

from PIL import Image
import numpy as np
# Load the image from file
image_path = 'path_to_your_image.png' # Replace with your image file path
img = Image.open(image_path)
# Convert image to grayscale and then to numpy array
img_gray = img.convert('L')
img_array = np.array(img_gray)
# Define the threshold to distinguish between a closed fist (darker) and an open hand (lighter)
# Assuming the fist will have less white (lower pixel values) than the open hand
threshold = img_array.mean()
# Function to convert the array into a binary string based on brightness
def get_binary_from_image(image_array, threshold, cols, rows):
# Determine the size of each grid cell
cell_width = image_array.shape[1] // cols
cell_height = image_array.shape[0] // rows
# Initialize the binary string
binary_string = ''
# Iterate over each row and column to get the binary string
for row in range(rows):
for col in range(cols):
# Calculate the cell's bounding box
left = col * cell_width
top = row * cell_height
right = (col + 1) * cell_width
bottom = (row + 1) * cell_height
# Crop the cell from the image
cell = image_array[top:bottom, left:right]
# Determine if the cell is a fist ('0') or an open hand ('1')
if cell.mean() < threshold:
binary_string += '0'
else:
binary_string += '1'
return binary_string
# Get the binary string from the image
binary_string = get_binary_from_image(img_array, threshold, 8, 6)
# Split the binary string into a list of 8-bit (1 byte) binary numbers
binary_bytes = [binary_string[i:i+8] for i in range(0, len(binary_string), 8)]
# Convert binary bytes to ASCII characters
decoded_message = ''.join([chr(int(b, 2)) for b in binary_bytes if all(c in '01' for c in b)])
print(decoded_message)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment