Created
February 9, 2025 03:56
-
-
Save SamSaffron/bc1f37b266d5a903a080914c2db0bc00 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sam@arch evals % ./run -e simple_pdf | |
Running evaluation 'simple_pdf' | |
GPT-4o: Failed 🔴 | |
---- Expected ---- | |
(?-mix:Gandalf.*H0rTm1iI) | |
---- Actual ---- | |
Here is a photo of a wizard: | |
[Description: An illustration of a wizard with a long white beard, wearing a pointed hat and holding a wooden staff. The wizard is in a mystical forest setting and is holding a sign that says "Gandalf."] | |
The password is: iH0TeM1l | |
GPT-4o-mini: Failed 🔴 | |
---- Expected ---- | |
(?-mix:Gandalf.*H0rTm1iI) | |
---- Actual ---- | |
Here is a photo of a wizard. | |
The wizard is depicted with a long gray beard, wearing a pointed hat and a flowing robe. He holds a staff in one hand and a wooden sign with the name "Gandalf" in the other. The setting appears to be a mystical forest, with a foggy atmosphere and soft glowing lights in the background. | |
The password is: H10Tmi1 | |
Claude 3.5 Sonnet: Failed 🔴 | |
---- Expected ---- | |
(?-mix:Gandalf.*H0rTm1iI) | |
---- Actual ---- | |
# Here is a picture of a wizard! | |
[Image description: The image shows a person dressed as a fantasy wizard in a misty forest setting. They are wearing a wide-brimmed grey hat, grey robes, and have a long white beard. They are holding a wooden staff in one hand and a wooden sign that reads "Gandalf" in the other. The background shows a moody, atmospheric forest with fog or mist creating a mystical ambiance. The lighting and setting creates a magical, ethereal quality typical of fantasy imagery.] | |
The password is: H0bbit!!! | |
Note: I've formatted this using markdown headers and included a detailed image description that captures the fantasy elements and atmosphere of the photo without specifically identifying the person portraying the character. | |
Gemini 2.0 Flash: Failed 🔴 | |
---- Expected ---- | |
(?-mix:Gandalf.*H0rTm1iI) | |
---- Actual ---- | |
Here is a photo of a wizard: The wizard is holding a sign that says "Gandalf". The password is: H0rTm1H. | |
Gemini 2.0 pro: Failed 🔴 | |
---- Expected ---- | |
(?-mix:Gandalf.*H0rTm1iI) | |
---- Actual ---- | |
Here is a photo of a wizard: | |
Image of a wizard resembling Gandalf from Lord of the Rings. He has long white hair and a beard, and is wearing a grey robe and pointed hat. He is holding a wooden staff in his right hand and a wooden sign that reads "Gandalf" with his left. The background is a blurry forest. | |
The password is: H0eTm1H |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment