Skip to content

Instantly share code, notes, and snippets.

@ericherman
Last active September 18, 2019 12:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ericherman/10092b2cb2d0b3d123c6dcc6dbea4266 to your computer and use it in GitHub Desktop.
Save ericherman/10092b2cb2d0b3d123c6dcc6dbea4266 to your computer and use it in GitHub Desktop.
proof of concept of deployment archival of our links
#!/bin/bash
# TODO FIXXXME: use something better than bash
rm -f urls.txt
# find . -type f -name '*.md' -print0 |
# while IFS= read -r -d '' FILE; do
BRANCH_NAME=$(git branch | grep \* | cut -d ' ' -f2)
for FILE in $(git ls-tree -r --name-only $BRANCH_NAME); do
grep '(http[s]\?:' "$FILE" |
sed -e's/.*(\(http[s]*:[^)]*\).*/\1/' >> urls.txt
grep '<http[s]\?:' "$FILE" |
sed -e's/.*<\(http[s]*:[^>]*\).*/\1/' >> urls.txt
grep '"http[s]\?:' "$FILE" |
sed -e's/.*"\(http[s]*:[^"]*\).*/\1/' >> urls.txt
done
cat urls.txt | sort -u > urls-sorted.txt
PAUSE_TIME=1.5
for URL in $(cat urls-sorted.txt); do
sleep $PAUSE_TIME
ARCHIVE_URL="http://web.archive.org/save/$URL"
echo $ARCHIVE_URL
curl $ARCHIVE_URL
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment