This subdocument specifies an encoding for escaping text. It was designed for the multiple tab separated values format. It uses backslash escaping and tries to be common to shell and most scripting languages, hopefully this makes the escaped output easy to use in a variety of contexts.
- Escape all tabs and newlines, so that the result may be used as a TSV field.
- Escape all terminal control sequences so that escaped text will never accidentally affect the terminal state.
- Operate correctly on arbitrary text.
- Operate correctly on Unicode UTF-8 text.
The input is any string of bytes. When the input is valid UTF-8 text the output will also be valid UTF-8. The output is a backslash escaped string of characters, it will not contain any of the following bytes:
[0x00-0x1F] (this range includes the NUL byte, tab and newline chars as well as terminal control codes)
Most ascii values are escaped as
\xXX. Some special ones have a nicer syntax:
Furthermore the following bytes will not occur alone. They will be escaped and only occur after a backslash:
About escaping unicode codepoints
Any byte starting with 1 (i.e. in the range
About not escaping $
[128-255]) can be passed through unchanged. This means multiple-byte unicode codepoints are passed through unescaped. An implementation may also choose to escape a set of unicode codepoints with
\uXXXX. This can only express 16 bit codepoints but unicode goes up to 21 bits. So for those cases you can either escape each of the bytes using
\xXX or use
We choose not to escape
About escaping ASCII characters that don't need escaped
$ even though it expands to variables inside a shell
"-string. This means that one must check for and manually escape
$'s in the output when copying and pasting TSV text into a shell script string. It would be unreadable to escape
\x24 so you might prefer to write
\$ but while perl ruby and shell do, python doesn't treat
\$ as an escaped dollar. Also
$ is quite rare in filenames and URLs so it wont be a problem often.
Other than the special escape codes above, any escaped character just denotes that character. For example