Skip to content

Instantly share code, notes, and snippets.

@kiyoto
Created June 2, 2012 22:21
Show Gist options
  • Save kiyoto/2860202 to your computer and use it in GitHub Desktop.
Save kiyoto/2860202 to your computer and use it in GitHub Desktop.
Just a random script to scrape info from HN's top page.
#!/bin/sh
HN_FILE='hn.html'
HN="http://news.ycombinator.com"
rm -f $HN_FILE
wget "$HN" -O $HN_FILE
if [ -f $HN_FILE ]
then
perl -ne 'while (m!<td class="title"><a href="(http[^"]+)">(?:.*?)</a><span class="comhead"> \(([^)]+)\) </span></td></tr><tr><td colspan=2></td>(?:<td class="subtext"><span id=score_\d+>(\d+) points</span> by <a href="user\?id=[^"]+">(?:.*?)</a> (\d+) (minute|hour|day)s? ago \| <a href="item\?id=\d+">(\d+) comments?</a>)?!g) { print "$1 $2"; if (defined $3) { print " $3 $4 $5 $6"; } print "\n"; }' < $HN_FILE
else
echo "Failed to download!"
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment