abadger/gist:bab2c5c5ed7f169c433e62389803af01

## gistfile1.txt
Why do we have unadorned string literals (native strings) in our codebase?
Doesn't that put us in danger of UnicodeError exceptions?

(1) Your codebase should be using text by default.  At the borders, you convert
    strings from other APIs into text and then use text throughout, only
    converting to bytes (or native strings) when those types are needed for
    another, outside API.

(2) On Python2, text can be safely combined with (or compared to) text[1]_.  Bytes
    can be combined with bytes.  And ascii-only bytes can be combined with text.

(3) On Python2, native strings are text so they follow the same rules as bytes:
    Safe to combine native strings with bytes.  Only safe to combine ascii-only
    native strings with text.

(4) On Python3, text can be safely combined with text.  Bytes can be combined
    with bytes.  Bytes and text can **never** be safely combined without an
    explicit conversion of one value or the other.

(5) On Python3, native strings are text so they follow the same rules as tet:
    Only safe to combine native strings with text.

If you understand all of the above, you'll find that the subset of safe types
to combine together on both Python2 and Python3 are: text with text, bytes with
bytes, and **ascii-only** native strings with text.  That last part is because
native strings are text on Python3 and ascii-only byte strings are safe to
combine with text on Python2.

.. [1]_: Combined with includes `str.join()`, %-formatted strings, and
         concatenation with ``+``.  `str.format()` needs to be understood to
         use safely, though.  The other methods will always convert the byte
         string to a text string using the ascii encoding.  str.format will
         convert its arguments to the type of string that it's a method of.

         .. seealso:: https://anonbadger.wordpress.com/2016/01/05/python2-string-format-and-unicode/

So, some examples:

This is safe to do::

    filenames = ('/path/one', '/path/two')
    if pathname in filenames:
        print('We are inside a recognized directory')

Following our coding guidelines (bullet point 1 in our list above), pathname
contains a text string.  On Python2, the values in filenames will be converted
to text strings safely because they only contain ascii characters and compared.
On Python3, the values in filenames are text strings and so the comparison
doesn't need to do any conversion so the comparison will be safe.

This is unsafe to do::

    filenames = os.listdir('.')

    if u'one' in filenames:
        print('Directory contains a recognized file')

In this example, filenames is getting native strings from a third-party API.
We can't control whether there are non-ascii characters in the filenames there.
So when we check to see if u'one' is one of the filenames, we are in danger of
a UnicodeError on Python2.  That's because the filenames on Python2 would be
a byte string.  So, in the comparison, Python2 will attempt to convert it into
a text string to match u'one'.  In doing so, it will use the ascii encoding.
A non-ascii filename will traceback in this case.

So, similar to how we use a `b_` prefix when we want a variable to hold a byte
string, a variable which holds native strings needs to be prefixed with `n_`
when we can't rule out a variable holding non-ascii characters.  In practice,
the easiest rule to follow is if you're setting the variable to a string
literal which only contains ascii characters, you are safe.  If you set the
variable to a string literal with non-ascii characters *or* you set the
variable to a native string from a function call, then the variable should be
prefixed with an `n_` to warn that you have to think about the corner cases
when combining this with other non-native variables.
	Why do we have unadorned string literals (native strings) in our codebase?
	Doesn't that put us in danger of UnicodeError exceptions?

	(1) Your codebase should be using text by default. At the borders, you convert
	strings from other APIs into text and then use text throughout, only
	converting to bytes (or native strings) when those types are needed for
	another, outside API.

	(2) On Python2, text can be safely combined with (or compared to) text[1]_. Bytes
	can be combined with bytes. And ascii-only bytes can be combined with text.

	(3) On Python2, native strings are text so they follow the same rules as bytes:
	Safe to combine native strings with bytes. Only safe to combine ascii-only
	native strings with text.

	(4) On Python3, text can be safely combined with text. Bytes can be combined
	with bytes. Bytes and text can never be safely combined without an
	explicit conversion of one value or the other.

	(5) On Python3, native strings are text so they follow the same rules as tet:
	Only safe to combine native strings with text.

	If you understand all of the above, you'll find that the subset of safe types
	to combine together on both Python2 and Python3 are: text with text, bytes with
	bytes, and ascii-only native strings with text. That last part is because
	native strings are text on Python3 and ascii-only byte strings are safe to
	combine with text on Python2.

	.. [1]_: Combined with includes `str.join()`, %-formatted strings, and
	concatenation with ``+``. `str.format()` needs to be understood to
	use safely, though. The other methods will always convert the byte
	string to a text string using the ascii encoding. str.format will
	convert its arguments to the type of string that it's a method of.

	.. seealso:: https://anonbadger.wordpress.com/2016/01/05/python2-string-format-and-unicode/

	So, some examples:

	This is safe to do::

	filenames = ('/path/one', '/path/two')
	if pathname in filenames:
	print('We are inside a recognized directory')

	Following our coding guidelines (bullet point 1 in our list above), pathname
	contains a text string. On Python2, the values in filenames will be converted
	to text strings safely because they only contain ascii characters and compared.
	On Python3, the values in filenames are text strings and so the comparison
	doesn't need to do any conversion so the comparison will be safe.

	This is unsafe to do::

	filenames = os.listdir('.')

	if u'one' in filenames:
	print('Directory contains a recognized file')

	In this example, filenames is getting native strings from a third-party API.
	We can't control whether there are non-ascii characters in the filenames there.
	So when we check to see if u'one' is one of the filenames, we are in danger of
	a UnicodeError on Python2. That's because the filenames on Python2 would be
	a byte string. So, in the comparison, Python2 will attempt to convert it into
	a text string to match u'one'. In doing so, it will use the ascii encoding.
	A non-ascii filename will traceback in this case.

	So, similar to how we use a `b_` prefix when we want a variable to hold a byte
	string, a variable which holds native strings needs to be prefixed with `n_`
	when we can't rule out a variable holding non-ascii characters. In practice,
	the easiest rule to follow is if you're setting the variable to a string
	literal which only contains ascii characters, you are safe. If you set the
	variable to a string literal with non-ascii characters or you set the
	variable to a native string from a function call, then the variable should be
	prefixed with an `n_` to warn that you have to think about the corner cases
	when combining this with other non-native variables.