Skip to content

Instantly share code, notes, and snippets.

@roblogic
Last active October 11, 2021 13:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save roblogic/5a200d4208b764a3fc4e7f4cfd4e2d12 to your computer and use it in GitHub Desktop.
Save roblogic/5a200d4208b764a3fc4e7f4cfd4e2d12 to your computer and use it in GitHub Desktop.
experimental messing about with nzh source ;)
#!/bin/zsh
[ $1 ]||{echo "Usage: $0 <nzherald-url>"&&exit 1;}
page=`mktemp herald.htm.XXXXX`
curl -s $1 > $page
echo "Extracting to: $page.txt"
xmllint --html --format $page --nowarning --xpath "//p" 2>/dev/null \
| perl -pe 's|<p.*?>||g;s|<span.*?>||g;s|</.*?>|\n|g;s|\n\n ||g' \
| fmt -p -s | perl -pe 's|<strong>||ig;s|&amp\;|&|g;s| ||g' \
| fmt -s | tee $page.txt | less
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment