mahmoud/unicode_encoding_kwarg_issue

## unicode_encoding_kwarg_issue
The encoding keyword argument to the Python 3 str() and Python 2 unicode() constructors is excessively constraining to the practical use of these core types.

Looking at common usage, both these constructors' primary mode is to convert various objects into text:

>>> str(2)
'2'

But adding an encoding yields:

>>> str(2, encoding='utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: coercing to str: need bytes, bytearray or buffer-like object, int found

While the error message is fine for an experienced developer, I would like to raise the question, is it necessary at all? Even harmlessly getting a str from a str is punished, but leaving off encoding is fine again:

>>> str('hi', encoding='utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoding str is not supported
>>> str('hi')
'hi'

Merging and simplifying the two modes of these constructors would yield much more predictable results for experienced and beginning Pythonists alike. Basically, the encoding argument should be ignored if the argument is already a unicode/str instance, or if it is a non-string object. It should only be consulted if the primary argument is a bytestring. Bytestrings already have a .decode() method on them, another, obscurer version of it isn't necessary.

Furthermore, despite the core nature and widespread usage of these types, changing this behavior should break very little existing code and understanding. unicode() and str() will simply behave as expected more often, returning text versions of the arguments passed to them.

Appendix: To demonstrate the expected behavior of the proposed unicode/str, here is a code snippet we've employed to sanely and safely get a text version of an arbitrary object:

def to_unicode(obj, encoding='utf8', errors='strict'):
    # the encoding default should look at sys's value
    try:
        return unicode(obj)
    except UnicodeDecodeError:
        return unicode(obj, encoding=encoding, errors=errors)
	The encoding keyword argument to the Python 3 str() and Python 2 unicode() constructors is excessively constraining to the practical use of these core types.

	Looking at common usage, both these constructors' primary mode is to convert various objects into text:

	>>> str(2)
	'2'

	But adding an encoding yields:

	>>> str(2, encoding='utf8')
	Traceback (most recent call last):
	File "<stdin>", line 1, in <module>
	TypeError: coercing to str: need bytes, bytearray or buffer-like object, int found

	While the error message is fine for an experienced developer, I would like to raise the question, is it necessary at all? Even harmlessly getting a str from a str is punished, but leaving off encoding is fine again:

	>>> str('hi', encoding='utf8')
	Traceback (most recent call last):
	File "<stdin>", line 1, in <module>
	TypeError: decoding str is not supported
	>>> str('hi')
	'hi'

	Merging and simplifying the two modes of these constructors would yield much more predictable results for experienced and beginning Pythonists alike. Basically, the encoding argument should be ignored if the argument is already a unicode/str instance, or if it is a non-string object. It should only be consulted if the primary argument is a bytestring. Bytestrings already have a .decode() method on them, another, obscurer version of it isn't necessary.

	Furthermore, despite the core nature and widespread usage of these types, changing this behavior should break very little existing code and understanding. unicode() and str() will simply behave as expected more often, returning text versions of the arguments passed to them.

	Appendix: To demonstrate the expected behavior of the proposed unicode/str, here is a code snippet we've employed to sanely and safely get a text version of an arbitrary object:

	def to_unicode(obj, encoding='utf8', errors='strict'):
	# the encoding default should look at sys's value
	try:
	return unicode(obj)
	except UnicodeDecodeError:
	return unicode(obj, encoding=encoding, errors=errors)