Skip to content

Instantly share code, notes, and snippets.

@hktechn0
Created November 6, 2009 15:11
Show Gist options
  • Save hktechn0/228037 to your computer and use it in GitHub Desktop.
Save hktechn0/228037 to your computer and use it in GitHub Desktop.
Replace HTML character entity reference
import re
import htmlentitydefs
def replace_htmlentity(string):
amp = string.find('&')
if amp == -1:
return string
entity = re.compile("&([A-Za-z]+);")
entity_match = entity.findall(string)
for name in entity_match:
try:
c = htmlentitydefs.name2codepoint[name]
except KeyError:
continue
string = string.replace("&%s;" % name, unichr(c))
return string
print replace_htmlentity("< >&&hogehoge;")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment