One might need to get a better grasp at what Browsertrix is performing in terms of trafic and where it is being blocked.
You might use the following docker compose stack to run:
- a mitmweb web proxy to intercept all Browsertrix traffic and dump them on file
- a screencasting Browsertrix crawler to get a grasp on crawler behavior
If you open http://localhost:9037, you will see Browsertrix screencasting of the browser.
mitmdump are placed in the output/mitmdump
folder, with one file per minute.
A sample script to process mitmdump and extract response with HTTP 429 status code and find the Retry-After
header is in extract_retry_after.py
and can be launched with mitmdump -n -r output/mitmdump/dump-xxxx-xx-xx-xx-xx -s extract_retry_after.py --flow-detail 0