Skip to content

Instantly share code, notes, and snippets.

@unhammer
Created April 20, 2012 10:30
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save unhammer/2427677 to your computer and use it in GitHub Desktop.
Save unhammer/2427677 to your computer and use it in GitHub Desktop.
turn an unholy jumble of iso-SOMETHING and UTF-8 into UTF-8
#!/bin/bash
decodehtmlentities='#!/usr/bin/env python2
import sys, codecs
sys.stdin = codecs.getreader("utf-8")(sys.stdin)
sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
sys.stderr = codecs.getwriter("utf-8")(sys.stderr)
import HTMLParser
h = HTMLParser.HTMLParser()
for line in sys.stdin:
print h.unescape(line),'
uconv -f utf-8 -t utf-8 --callback escape-xml-dec | python2 -c "$decodehtmlentities"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment