Skip to content

Instantly share code, notes, and snippets.

@haggaie
Created April 22, 2012 06:30
Show Gist options
  • Save haggaie/2462016 to your computer and use it in GitHub Desktop.
Save haggaie/2462016 to your computer and use it in GitHub Desktop.
Convert \uDDDD sequences to unicode
#/usr/bin/env python
import sys, re
text = sys.stdin.read()
result = unicode()
unicode_char_re = re.compile(r'\\u[0-9a-fA-F]{4}')
prev_pos = 0
for m in unicode_char_re.finditer(text):
span = m.span()
result += text[prev_pos:span[0]]
char = text[span[0]:span[1]]
char = unichr(int(char[2:],16))
result += char
prev_pos = span[1]
result += text[prev_pos:]
print result.encode('utf-8')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment