Skip to content

Instantly share code, notes, and snippets.

@yammesicka
Last active November 16, 2019 05:42
Show Gist options
  • Save yammesicka/948ccb490f23f1edde6945059faa05bf to your computer and use it in GitHub Desktop.
Save yammesicka/948ccb490f23f1edde6945059faa05bf to your computer and use it in GitHub Desktop.
Delete unwanted characters from txt files, and convert them to valid text files
import re
PATH = r"C:\Projects\Notebooks\week4\resources\alice.txt"
with open(PATH, 'rb') as f:
text = f.read()
new_text = re.sub(br'[^a-zA-Z0-9_ ?!@#$%^&*\(\)+.,></\\[\]\n\r-]', br'', text)
with open(PATH, 'w') as f:
f.write(new_text.decode('utf-8'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment