Skip to content

Instantly share code, notes, and snippets.

@readyready15728
Last active January 3, 2024 08:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save readyready15728/f3749e6cfd788fae8d24041a26c7e4a7 to your computer and use it in GitHub Desktop.
Save readyready15728/f3749e6cfd788fae8d24041a26c7e4a7 to your computer and use it in GitHub Desktop.
Quick and dirty search for 404s among Markdown links
# Run like so at repository root directory:
#
# python3 search_for_404s.py > output &
# tail -f output
import glob
import re
import requests
import sys
regex_voodoo = '((http|https)://)[-a-zA-Z0-9@:%._\\+~#?&//=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9@:%._\\+~#?&//=]*)'
unique_pages = set()
for md in glob.glob('**/*.md', recursive=True):
with open(md) as f:
lines = f.readlines()
for line in lines:
m = re.search(regex_voodoo, line)
if m:
unique_pages.add(m.group())
for page in unique_pages:
print(page)
if requests.get(page).status_code == 404:
print('MISSING')
sys.stdout.flush()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment