Skip to content

Instantly share code, notes, and snippets.

@dhondta

dhondta/README.md

Last active Jan 23, 2020
Embed
What would you like to do?
Tinyscript tool for replacing text in files from a target folder and based on a JSON dictionary of replacement patterns

DocTextMasker

A simple tool for recursively replacing disturbing/undesired text inside documents contained in a given folder based on a JSON dictionary defining regular expressions and the replacements to be applied.

Setup

This can be installed using:

$ pip install tinyscript
$ wget https://gist.githubusercontent.com/dhondta/5cae9533240471eac155bd51593af2e0/raw/doc-text-masker.py && chmod +x doc-text-masker.py && sudo mv doc-text-masker.py /usr/bin/doc-text-masker
$ wget https://gist.githubusercontent.com/dhondta/5cae9533240471eac155bd51593af2e0/raw/replacements.json

Features

  • Recursive folder parsing
  • No filtering regarding the file format
  • Ask for confirmation before replacing
  • Execute without applying changes (test mode)

Usage

This tool is useful for replacing particular strings, e.g. in a documentation folder, and allows to test then run the replacements that are to be done based on a JSON dictionary defining all the (regex, replacement) pairs to be handled.

$ ./doc-text-masker.py -h
usage: ./doc-text-masker.py [-a] [-b] [-c {*,#,@,+,-,%,$}] [-e EXT [EXT ...]]
                            [-r REPLACEMENTS] [-t] [-h] [-v]
                            folder

DocTextMasker v3.0
Author   : Alexandre D'Hondt

This tool parses all Markdown files in the specified folder and replaces
 multiple metadata by a hidding character. The purpose is to mask metadata in
 the tool outputs and sessions shown in the Markdown files.

positional arguments:
  folder              target folder

optional arguments:
  -a                  ask for confirmation (default: False)
  -b                  take a backup copy (default: False)
  -c {*,#,@,+,-,%,$}  hiding char (default: #)
  -e EXT [EXT ...]    extensions to be handled (default: ['md', 'mdtxt', 'txt'])
  -r REPLACEMENTS     replacements JSON file (default: replacements.json)
  -t                  display modifications but do not apply them (default: False)
                       NB: this ignores -a and -b

extra arguments:
  -h, --help          show this help message and exit
  -v, --verbose       verbose mode (default: False)

Usage examples:
  ./doc-text-masker.py 
  ./doc-text-masker.py -t
  ./doc-text-masker.py -r my-own-replacements.json
  ./doc-text-masker.py -f docs -c $

Example

  1. Testing
$ ./doc-text-maker -t
12:34:56 [WARNING] Changes in 'src/trace.txt':
0: 12:34:56 [INFO] [0a:1b:2c:3d:4e:5f]127.0.0.1:12345 -> [1b:2c:3d:4e:5f:0a]127.0.0.1:8000
   12:34:56 [INFO] [0a:1b:2c:##:##:##]127.0.0.1:12345 -> [1b:2c:3d:##:##:##]127.0.0.1:8000
  1. Replacement
$ ./doc-text-maker -v
12:34:56 [DEBUG] Entering 'src'...
12:34:56 [DEBUG] Parsing 'src/trace.txt'...
12:34:56 [DEBUG] > Saving new file...

Replacement creation

Use case: We want to display a session for illustrating the execution of a CLI tool. However, we don't want to display the date and times of execution while displaying the logging trace of the tool.

Example: Telnet trace

$ telnet 192.168.1.2

Trying 192.168.1.2...
Connected to 192.168.1.2.
Escape character is '^]'.
[...]
Last login: Thu Dec 29 23:58:00 UTC 2016 on tty1
[...]

We want to hide "Thu Dec 29 23:58:00 UTC 2016". The (Python-style) regular expression that matches such a line is:

r'Last\slogin\:\s([A-Z][a-z]{1,2}\s[A-Z][a-z]{1,2}\s\d{2}\s\d{2}:\d{2}:\d{2}\s[A-Z]{3}\s\d{4})'

The JSON item that can be added to the dictionary is thus:

    "telnet-datetime": [
        "Last\\slogin\\:\\s([A-Z][a-z]{1,2}\\s[A-Z][a-z]{1,2}\\s\\d{2}\\s\\d{2}:\\d{2}:\\d{2}\\s[A-Z]{3}\\s\\d{4})", 
        "{0}{0}{0} {0}{0}{0} {0}{0} {0}{0}:{0}{0}:{0}{0} {0}{0}{0}{0}"
    ] 

Note that "{0}" is the format string that designates the first input argument in str.format(), that is, the selected hidding char (by default, "#").

#!/usr/bin/python3
# -*- coding: UTF-8 -*-
import json
from tinyscript import *
__author__ = "Alexandre D'Hondt"
__version__ = "3.1"
__doc__ = """
This tool parses all Markdown files in the specified folder and replaces
multiple metadata by a hidding character. The purpose is to mask metadata in
the tool outputs and sessions shown in the Markdown files.
"""
__examples__ = ["", "-t", "-r my-own-replacements.json", "-f docs -c $"]
def apply_replacements(fp):
"""
This function handles a text file for replacements according to a user-
provided list of replacements formatted as pairs (regexp, replacement).
:param fp: Path instance of file to be handled
"""
logger.debug("Parsing '{}'...".format(fp))
# retrieve file content
content = contentm = fp.read_text()
# apply replacements to 'contentm' buffer
for cat, regex in args.replacements.items():
regex, repl = regex
try:
contentm = regex.sub(lambda m: m.group(0).replace(m.groups()[0],
repl.format(args.mchar)), contentm)
except UnicodeDecodeError:
pass
h = lambda t: hashlib.sha256(t.encode()).hexdigest()
if h(content) != h(contentm):
# if testing mode, just display the line (if any replacement)
if args.test:
diff, i = [], 0
for l1, l2 in zip(content.split('\n'), contentm.split('\n')):
if l1 != l2:
diff.append("{}: {}\n{} {}"
.format(i, l1, len(str(i)) * ' ', l2))
i += 1
logger.warn("Changes in '{}':\n{}".format(fp, '\n'.join(diff)))
# if replacements were done
elif not args.ask or args.ask and ts.confirm():
# backup original file if required
if args.backup:
bf = p.parent.joinpath("." + args.folder.basename, create=True)
bfp = bf.joinpath(fp.basename + ".bak")
if not bfp.exists():
logger.debug("> Saving backup copy...")
bfp.write_text(content)
# overwrite original file with the replaced content
logger.debug("> Saving new file...")
fp.write_text(contentm)
else:
logger.debug("> No change")
if __name__ == '__main__':
global args
parser.add_argument("folder", type=ts.folder_exists_or_create,
help="target folder")
parser.add_argument("-a", dest="ask", action="store_true",
help="ask for confirmation")
parser.add_argument("-b", dest="backup", action="store_true",
help="take a backup copy")
parser.add_argument("-c", dest="mchar", default='#', choices="*#@+-%$",
help="hiding char")
parser.add_argument("-e", dest="ext", nargs="+",
default=["md", "mdtxt", "txt"],
help="extensions to be handled")
parser.add_argument("-r", dest="replacements", default="replacements.json",
type=ts.file_exists, help="replacements JSON file")
parser.add_argument("-t", dest="test", action="store_true",
help="display modifications but do not apply them",
note="this ignores -a and -b")
initialize()
args.replacements = {k: (re.compile(v[0]), v[1]) \
for k, v in json.load(open(args.replacements)).items()}
args.folder = Path(args.folder)
# running the main stuff
ffunc = lambda x: any(str(x).endswith(e) for e in args.ext)
for p in args.folder.walk(filter_func=ffunc):
apply_replacements(p)
{
"whois-datetime": [
"Last\\supdate\\sof\\swhois\\sdatabase\\:\\s([A-Z][a-z]{2}\\,\\s\\d{2}\\s[A-Z][a-z]{2}\\s\\d{4}\\s\\d{2}\\:\\d{2}:\\d{2}\\s[A-Z]{3})",
"{0}{0}{0}, {0}{0} {0}{0}{0} {0}{0}{0}{0} {0}{0}:{0}{0}:{0}{0} {0}{0}{0}"
],
"mac": [
"[a-fA-F0-9]{2}:[a-fA-F0-9]{2}:[a-fA-F0-9]{2}:([a-fA-F0-9]{2}:[a-fA-F0-9]{2}:[a-fA-F0-9]{2})",
"{0}{0}:{0}{0}:{0}{0}"
],
"patator-datatime": [
"\\(http\\:\\/\\/code\\.google\\.com\\/p\\/patator\\/\\)\\sat\\s(\\d{4}\\-\\d{2}\\-\\d{2}\\s\\d{2}\\:\\d{2}\\s[A-Z]{3})",
"{0}{0}{0}{0}-{0}{0}-{0}{0} {0}{0}:{0}{0} {0}{0}{0}"
],
"ssh-datetime": [
"Last\\slogin\\:\\s([A-Z][a-z]{1,2}\\s[A-Z][a-z]{1,2}\\s\\d{2}\\s\\d{2}:\\d{2}:\\d{2}\\s\\d{4})",
"{0}{0}{0} {0}{0}{0} {0}{0} {0}{0}:{0}{0}:{0}{0} {0}{0}{0} {0}{0}{0}{0}"
],
"ncrack-datetime": [
"\\(\\shttp\\:\\/\\/ncrack.org\\s\\)\\sat\\s(\\d{4}\\-\\d{2}\\-\\d{2}\\s\\d{2}\\:\\d{2}\\s[A-Z]{3})",
"{0}{0}{0}{0}-{0}{0}-{0}{0} {0}{0}:{0}{0} {0}{0}{0}"
],
"nmap-datetime": [
"Starting\\sNmap\\s\\d+\\.\\d+\\s\\(\\shttps\\:\\/\\/nmap\\.org\\s\\)\\sat\\s+(\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2})",
"{0}{0}{0}{0}-{0}{0}-{0}{0} {0}{0}:{0}{0}"
],
"telnet-datetime": [
"Last\\slogin\\:\\s([A-Z][a-z]{1,2}\\s[A-Z][a-z]{1,2}\\s\\d{2}\\s\\d{2}:\\d{2}:\\d{2}\\s[A-Z]{3}\\s\\d{4})",
"{0}{0}{0} {0}{0}{0} {0}{0} {0}{0}:{0}{0}:{0}{0} {0}{0}{0}{0}"
],
"hydra-info": [
"\\(http\\:\\/\\/www\\.thc\\.org\\/thc\\-hydra\\)\\s(?:starting|finished)\\sat\\s(\\d{4}\\-\\d{2}\\-\\d{2}\\s\\d{2}\\:\\d{2}\\:\\d{2})",
"{0}{0}{0}{0}-{0}{0}-{0}{0} {0}{0}:{0}{0}:{0}{0}"
],
"patator-logging": [
"(\\d{2}\\:\\d{2}\\:\\d{2})\\spatator",
"{0}{0}:{0}{0}:{0}{0}"
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment