Skip to content

Instantly share code, notes, and snippets.

@divadsn
Created March 2, 2019 02:39
Show Gist options
  • Save divadsn/626ca90541b176a2d866eaa32732d6e7 to your computer and use it in GitHub Desktop.
Save divadsn/626ca90541b176a2d866eaa32732d6e7 to your computer and use it in GitHub Desktop.
Normalize and remove illegal characters in filenames in Python.
import string
import unicodedata
INVALID_FILE_CHARS = '/\\?%*:|"<>' # https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words
def secure_filename(filename):
# keep only valid ascii chars
output = list(unicodedata.normalize("NFKD", filename))
# special case characters that don't get stripped by the above technique
for pos, char in enumerate(output):
if char == '\u0141':
output[pos] = 'L'
elif char == '\u0142':
output[pos] = 'l'
# remove unallowed characters
output = [c if c not in INVALID_FILE_CHARS else '_' for c in output]
return "".join(output).encode("ASCII", "ignore").decode()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment