Skip to content

Instantly share code, notes, and snippets.

@qstokkink
Last active March 8, 2019 09:04
Show Gist options
  • Save qstokkink/91d2b22a647a14d16726dd3745aaa063 to your computer and use it in GitHub Desktop.
Save qstokkink/91d2b22a647a14d16726dd3745aaa063 to your computer and use it in GitHub Desktop.
No-character-mapping-nonsense Python2/3 String class
import six
from six.moves import xrange
class String(six.text_type):
@staticmethod
def as_raw_unicode(value=u""):
if isinstance(value, six.text_type):
return value
elif isinstance(value, six.binary_type):
return u"".join(six.unichr(c) for c in six.iterbytes(value))
else:
return six.text_type(value)
def raw_encode(self):
return b''.join(b''.join(six.int2byte((ord(uc) & 0xFF00) >> 8),
six.int2byte(ord(uc) & 0xFF)) for uc in self)
@classmethod
def raw_decode(cls, encoded):
return cls(u"".join(six.unichr((six.byte2int(encoded[i]) << 8) | six.byte2int(encoded[i+1]))
for i in xrange(0, len(encoded), 2)))
@staticmethod
def __new__(cls, *more):
return six.text_type.__new__(cls, cls.as_raw_unicode(more[0]) if more else u"")
@cclauss
Copy link

cclauss commented Mar 6, 2019

This does look cool. It would be great to have some test strings with expected output for each.

def as_raw_unicode(value=u""):
    if isinstance(value, bytes):
        return codecs.decode(value, 'raw_unicode_escape')
    else:
        return six.text_type(value)

Also, on raw_decode() couldn't [ and [ and ] and ] all be removed to remain in generator mode and avoid creating lists of list?

@cclauss
Copy link

cclauss commented Mar 6, 2019

Also, on the last line unicode should be replaced with six.text_type.

@qstokkink
Copy link
Author

@cclauss nice catches, thanks! Updated the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment