Instantly share code, notes, and snippets.

What would you like to do?
This file contains code that, when run on Python 2.7.5 or earlier, creates
a string that should not exist: u'\Udeadbeef'. That's a single "character"
that's illegal in Python because it's outside the valid Unicode range.
It then uses it to crash various things in the Python standard library and
corrupt a database.
On Python 3... well, this file is full of syntax errors on Python 3. But
if you were to change the print statements and byte literals and stuff:
* You'd probably see the same bug on Python 3.2.
* On Python 3.3, you'd just get an error making the string on the first line.
* On Python 3.3.3, the error even makes sense.
On narrow builds of Python, u'\Udeadbeef' gets immediately truncated to
u'\ubeef', a totally safe character. (It's a nonsense syllable in
Korean.) For once, narrow Python's half-assed Unicode support has saved you.
The relevant bug is:
# Use a bug in the UTF-7 decoder to create a string containing codepoint
# U+DEADBEEF. (Keep in mind that Unicode ends at U+10FFFF.)
deadbeef = '+d,+6t,+vu8-'.decode('utf-7', 'replace')[-1]
print repr(deadbeef)
# outputs u'\Udeadbeef'. That's not a valid string literal.
import codecs
with'deadbeef.txt', 'w', encoding='utf-8') as outfile:
print >> outfile, deadbeef
# writes a non-UTF-8 file
with'deadbeef.txt', encoding='utf-8') as infile:
except UnicodeDecodeError:
print "Boom! Broke your text file."
import re
re.match(u'[A-%s]' % deadbeef, u'test')
except MemoryError:
print "Boom! Broke your regular expression."
import sqlite3
db = sqlite3.connect('deadbeef.db')
db.execute(u'CREATE TABLE deadbeef (id integer primary key, value text)')
db.execute(u'INSERT INTO deadbeef (value) VALUES (?)', u'\U0001f602')
db.execute(u'SELECT * FROM deadbeef').fetchall()
# This works fine. I'm just convincing you that SQLite has no problem with
# Unicode itself.
db.execute(u'INSERT INTO deadbeef (value) VALUES (?)', deadbeef)
db.execute(u'SELECT * FROM deadbeef').fetchall()
except sqlite3.OperationalError:
print "Boom! Corrupted your database."
# As a bonus, if you run that SQLite query at the IPython prompt, it gets
# a second error trying to print out the error message.

This comment has been minimized.

mdesantis commented Nov 21, 2013

I love it. So simple yet so effective.


This comment has been minimized.

mcormier commented Nov 21, 2013

Feature request. Print statement messages should be in lolcatz form.

Boom! I haz broke ur regular expressions!


This comment has been minimized.

peterbe commented Nov 21, 2013

What's a "narrow build"?


This comment has been minimized.

theonewolf commented Nov 21, 2013

Python 2.7.3 getting this:

: File name too long
./ line 25: syntax error near unexpected token `('
./ line 25: `deadbeef = '+d,+6t,+vu8-'.decode('utf-7', 'replace')[-1]'

This comment has been minimized.

jandk commented Nov 21, 2013

On gentoo 2.7.5 it doesn't work (u'\U1eadbeef'), maybe they patched it already?


This comment has been minimized.

ye commented Nov 21, 2013

Has anyone tried MySQL or PostgreSQL? This example only crashed SQLite though, which is super lightweight may not handle unicode errors robustly.


This comment has been minimized.

leepa commented Nov 21, 2013

OSX - works fine
Ubuntu 12.04.2 LTS - Boom boom boom
FreeBSD 9.2 - yeah... locks up until the process is killed.


This comment has been minimized.

deedeethepinhead commented Nov 21, 2013

Linux Mint 16 64bit, starting the script with Sublime or Spyder => system crash or at least freezes (too unpatient to wait much longer)


This comment has been minimized.

acdha commented Nov 21, 2013

@peterbe: it's a Python compile flag which controls whether Unicode support includes only the Basic Multilingual Plane or the full range of Unicode characters (i.e. does it end at 0x10000 or 0x10FFFF). See

This used to only be of interest to those of us working with relatively obscure multilingual content but has become a lot more important for most people now that things outside the BMP like Emoji have become very common. It means that len() won't work as expected on those characters in most Python 2.x builds. Try running under both Python 2 and 3 if you're severely bored.


This comment has been minimized.

gsakkis commented Nov 21, 2013

Slightly simpler way to get a hold of deadbeef:'+d,+6t,+vu8-'.decode('utf-7', 'ignore')


This comment has been minimized.


rspeer commented Nov 23, 2013

It works fine on OSX only because OSX's default Python is a narrow build. (Kind of disappointing for an OS with otherwise good support for lots of characters, including emoji.) The character just ends up being '\ubeef'.


This comment has been minimized.

cxcv commented Nov 27, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment