Skip to content

Instantly share code, notes, and snippets.

@lidopaglia
Created May 13, 2023 22:27
Show Gist options
  • Save lidopaglia/b66046fa6bc84c054dacf249012c6dce to your computer and use it in GitHub Desktop.
Save lidopaglia/b66046fa6bc84c054dacf249012c6dce to your computer and use it in GitHub Desktop.
#!/bin/sh
# example converts an html table to json using pup & jq
# unfortunately, it appears pup does not natively decode characters within the
# html transform? tried using the --plain flag but no dice.
curl -s https://news.ycombinator.com/ \
| pup 'table table tr:nth-last-of-type(n+2) td.title a json{}' \
| jq '.[]|{href,text}|select(.text!=null)' \
| jq -s '.'
# example using htmlq and hxclean to extract a simple table from existing html
curl -s https://news.ycombinator.com/ \
| htmlq --pretty tr[class="athing"] -b https://news.ycombinator.com \
| hxclean > hn.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment