A simple Tinyscript-based tool for recursively replacing disturbing/undesired text inside documents contained in a given folder based on a JSON dictionary defining regular expressions and the replacements to be applied.
This can be installed using:
$ pip install tinyscript
$ tsm install doc-text-masker
$ wget https://gist.githubusercontent.com/dhondta/5cae9533240471eac155bd51593af2e0/raw/replacements.json
- Recursive folder parsing
- No filtering regarding the file format
- Ask for confirmation before replacing
- Execute without applying changes (test mode)
This tool is useful for replacing particular strings, e.g. in a documentation folder, and allows to test then run the replacements that are to be done based on a JSON dictionary defining all the (regex, replacement) pairs to be handled.
$ ./doc-text-masker.py -h
usage: ./doc-text-masker.py [-a] [-b] [-c {*,#,@,+,-,%,$}] [-e EXT [EXT ...]]
[-r REPLACEMENTS] [-t] [-h] [-v]
folder
DocTextMasker v3.0
Author : Alexandre D'Hondt
This tool parses all Markdown files in the specified folder and replaces
multiple metadata by a hidding character. The purpose is to mask metadata in
the tool outputs and sessions shown in the Markdown files.
positional arguments:
folder target folder
optional arguments:
-a ask for confirmation (default: False)
-b take a backup copy (default: False)
-c {*,#,@,+,-,%,$} hiding char (default: #)
-e EXT [EXT ...] extensions to be handled (default: ['md', 'mdtxt', 'txt'])
-r REPLACEMENTS replacements JSON file (default: replacements.json)
-t display modifications but do not apply them (default: False)
NB: this ignores -a and -b
extra arguments:
-h, --help show this help message and exit
-v, --verbose verbose mode (default: False)
Usage examples:
./doc-text-masker.py
./doc-text-masker.py -t
./doc-text-masker.py -r my-own-replacements.json
./doc-text-masker.py -f docs -c $
- Testing
$ ./doc-text-maker -t
12:34:56 [WARNING] Changes in 'src/trace.txt':
0: 12:34:56 [INFO] [0a:1b:2c:3d:4e:5f]127.0.0.1:12345 -> [1b:2c:3d:4e:5f:0a]127.0.0.1:8000
12:34:56 [INFO] [0a:1b:2c:##:##:##]127.0.0.1:12345 -> [1b:2c:3d:##:##:##]127.0.0.1:8000
- Replacement
$ ./doc-text-maker -v
12:34:56 [DEBUG] Entering 'src'...
12:34:56 [DEBUG] Parsing 'src/trace.txt'...
12:34:56 [DEBUG] > Saving new file...
Use case: We want to display a session for illustrating the execution of a CLI tool. However, we don't want to display the date and times of execution while displaying the logging trace of the tool.
Example: Telnet trace
$ telnet 192.168.1.2
Trying 192.168.1.2...
Connected to 192.168.1.2.
Escape character is '^]'.
[...]
Last login: Thu Dec 29 23:58:00 UTC 2016 on tty1
[...]
We want to hide "Thu Dec 29 23:58:00 UTC 2016
". The (Python-style) regular expression that matches such a line is:
r'Last\slogin\:\s([A-Z][a-z]{1,2}\s[A-Z][a-z]{1,2}\s\d{2}\s\d{2}:\d{2}:\d{2}\s[A-Z]{3}\s\d{4})'
The JSON item that can be added to the dictionary is thus:
"telnet-datetime": [
"Last\\slogin\\:\\s([A-Z][a-z]{1,2}\\s[A-Z][a-z]{1,2}\\s\\d{2}\\s\\d{2}:\\d{2}:\\d{2}\\s[A-Z]{3}\\s\\d{4})",
"{0}{0}{0} {0}{0}{0} {0}{0} {0}{0}:{0}{0}:{0}{0} {0}{0}{0}{0}"
]
Note that "{0}
" is the format string that designates the first input argument in str.format()
, that is, the selected hidding char (by default, "#
").