Skip to content

Instantly share code, notes, and snippets.

@zhangzhhz
Last active October 8, 2020 07:52
Show Gist options
  • Save zhangzhhz/7db631d1875269d61a6aa439b7d2d8dc to your computer and use it in GitHub Desktop.
Save zhangzhhz/7db631d1875269d61a6aa439b7d2d8dc to your computer and use it in GitHub Desktop.
Keep only printable characters in ASCII character set

https://en.wikipedia.org/wiki/ASCII#Printable_characters

Codes 20 to 7E, known as the printable characters, represent letters, digits, punctuation marks, and a few miscellaneous symbols. There are 95 printable characters in total.

Code 20, the "space" character, denotes the space between words, as produced by the space bar of a keyboard. Since the space character is considered an invisible graphic (rather than a control character) it is listed in the table below instead of in the previous section.

Code 7F corresponds to the non-printable "delete" (DEL) control character and is therefore omitted from this chart; it is covered in the previous section's chart. Earlier versions of ASCII used the up arrow instead of the caret (5E) and the left arrow instead of the underscore (5F).

Javascript:

> nonPrintable = /[^ -~]/g;
/[^ -~]/g
> 'hello world 你好 ! '.replace(nonPrintable, ' ')
'hello world    ! '
> nonPrintable2 = /[^\x20-\x7E]/g;
/[^\x20-\x7E]/g
> 'hello world 你好 ! '.replace(nonPrintable2, ' ')
'hello world    ! '
>

Python

import re
re.sub('[^ -~]', ' ', 'hello world 你好 ! ')
# 'hello world    ! '
re.sub(r'[^\x20-\x7E]', ' ', 'hello world 你好 ! ')
# 'hello world    ! '
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment