Skip to content

Instantly share code, notes, and snippets.

@mypetyak
Created November 1, 2015 21:19
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mypetyak/158d793c8935b1c993df to your computer and use it in GitHub Desktop.
Save mypetyak/158d793c8935b1c993df to your computer and use it in GitHub Desktop.
Unicode Sandwich, demonstrating multiple encodings.
# -*- encoding: utf-8 -*-
import re
# Byte string containing an Icelandic pangram encoded in mac_iceland
input = 'Svo h\x9alt, yxna k\xe0r \xdfeg\xddi j\x9c um d\x97p \x92 f\x8e \x87 b\xbe.'
# Create a Unicode object from the string, decoding with the mac_iceland
# encoding
u_string = input.decode('mac_iceland')
re.sub(r'\w{4}', u'xxxx', u_string, flags=re.UNICODE)
# Print to UTF-8, which your terminal probably understands
print u_string.encode('UTF-8')
with open('output.txt', 'wb') as file:
# Write the new unicode string to file using UTF-8 encoding
file.write(u_string.encode('UTF-8'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment