Skip to content

Instantly share code, notes, and snippets.

@simonw
Created June 14, 2021 16:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save simonw/7f8c3007d79e247e5667aaa7bd26927f to your computer and use it in GitHub Desktop.
Save simonw/7f8c3007d79e247e5667aaa7bd26927f to your computer and use it in GitHub Desktop.
alnum encoding scheme

alnum encoding scheme

The goal is to be able to take any Python string and reversibly convert it into a string that consists only of a-zA-Z9-0_ characters.

>>> alnum_encode("hello.csv")
'hello_2e_csv'
>>> alnum_encode("this é has ü accents")
'this_20__e9__20_has_20__fc__20_accents'
import re
ALLOWED = "abcdefghijklmnopqrstuvwxyz" "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "0123456789"
split_re = re.compile("(_[0-9a-f]+_)")
def alnum_encode(s):
encoded = []
for char in s:
if char in ALLOWED:
encoded.append(char)
else:
encoded.append("_" + hex(ord(char))[2:] + "_")
return "".join(encoded)
split_re = re.compile("(_[0-9a-f]+_)")
def alnum_decode(s):
decoded = []
for bit in split_re.split(s):
if bit.startswith("_"):
hexbit = bit[1:-1]
decoded.append(chr(int(hexbit, 16)))
else:
decoded.append(bit)
return "".join(decoded)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment