Putting up a bounty didn't produce a solution to this, so I've spent a while figuring out the best one I could. I don't have any source confirming that this is the correct way to do it, so use at your own risk.
To convert any string into a SQLite identifier:
- Ensure the string can be encoded as UTF-8.
- Ensure the string does not include any NUL characters.
- Replace all
- Wrap the entire thing in double quotes.
import codecs def quote_identifier(s, errors="strict"): encodable = s.encode("utf-8", errors).decode("utf-8") nul_index = encodable.find("\x00") if nul_index >= 0: error = UnicodeEncodeError("utf-8", encodable, nul_index, nul_index + 1, "NUL not allowed") error_handler = codecs.lookup_error(errors) replacement, _ = error_handler(error) encodable = encodable.replace("\x00", replacement) return "\"" + encodable.replace("\"", "\"\"") + "\""
Given a string single argument, it will escape and quote it correctly or raise an exception. The second argument can be used to specify any error handler registered in the
codecs module. The built-in ones are:
'strict': raise an exception in case of an encoding error
'replace': replace malformed data with a suitable replacement marker, such as
'ignore': ignore malformed data and continue without further notice
'xmlcharrefreplace': replace with the appropriate XML character reference (for encoding only)
'backslashreplace': replace with backslashed escape sequences (for encoding only) This doesn't check for reserved identifiers, so if you try to create a new
SQLITE_MASTERtable it won't stop you.
import sqlite3 def test_identifier(identifier): "Tests an identifier to ensure it's handled properly." with sqlite3.connect(":memory:") as c: c.execute("CREATE TABLE " + quote_identifier(identifier) + " (foo)") assert identifier == c.execute("SELECT name FROM SQLITE_MASTER").fetchone() test_identifier("'Héllo?'\\\n\r\t\"Hello!\" -☃") # works test_identifier("北方话") # works test_identifier(chr(0x20000)) # works print(quote_identifier("Fo\x00o!", "replace")) # prints "Fo?o!" print(quote_identifier("Fo\x00o!", "ignore")) # prints "Foo!" print(quote_identifier("Fo\x00o!")) # raises UnicodeEncodeError print(quote_identifier(chr(0xD800))) # raises UnicodeEncodeError
- SQLite identifiers are
TEXT, not binary.
SQLITE_MASTERschema in the FAQ.
- Reference: Python 2 SQLite API yelling at me when I gave it bytes it couldn't decode as text.
- Reference: Python 3 SQLite API requires queries be
- SQLite identifiers are quoted using double-quotes.
- Reference: SQL as Understood by SQLite.
- Double-quotes in SQLite identifiers are escaped as two double quotes.
- SQLite identifiers preserve case, but they are case-insensitive towards ASCII letters. It is possible to enable unicode-aware case-insensitivity.
- Source: SQLite FAQ Question #18
- SQLite stops reading queries at the NUL character, but does not have any way to escape it.
- Reference: PHP came up with their own binary encoding to handle NUL; presumably if they could just store it directly they would.
sqlite3can handle any other unicode string as long as it can be properly encoded to UTF-8. Invalid strings could cause crashes between Python 3.0 and Python 3.1.2 or thereabouts. Python 2 accepted these invalid strings, but this is considered a bug.