Skip to content

Instantly share code, notes, and snippets.

@oskopek
Created March 6, 2016 00:04
Show Gist options
  • Save oskopek/86a71ccb55266a150521 to your computer and use it in GitHub Desktop.
Save oskopek/86a71ccb55266a150521 to your computer and use it in GitHub Desktop.
A simple webscraping bash script for the 2016 Elections in Slovakia
file=vysledky.html
out=vysledky.tmp
res=vysledky
while true; do
: > $out
echo -e '<html><head>\n
<meta charset="UTF-8">\n
</head>\n<body>' >> $out
echo `date`
curl -s http://www.vysledkyvolieb.sk/parlamentne-volby/2016/priebezne-vysledky > $file
echo `date` `cat $file | grep -o '[0-9].*% hlasov'` `echo "<br><br>"` >> $out
echo -e "\nPredbezne vysledky:<br><br><ol>" >> $out
#cat $file | grep -E '/strana/' | sed -E 's/<[^>]*>//g' | sed -E 's/<\/a>//g' >> $out
#cat $file | grep -E '(strana)|([0-9]*,[0-9]*)' | sed -E 'N;s/\n([0-9]*,[0-9]*)/ \1/g' | sed -E 's/<[^>]*>//g' | sed -E 's/<\/a>//g' | tail -n 49 | head -n -3 | sed -E 's/$/<\/li><br>/g' | sed -E 's/^/<li>/g' >> $out
cat $file | grep -E '(href=\"/parlamentne-volby/strana/)|([0-9]+,[0-9]+)' | tail -n 46 | sed -E 'N;s/\n([0-9]*,[0-9]*)/ \1/g' | sed -E 's/<[^>]*>//g' | sed -E 's/<\/a>//g' | sed -E 's/$/<\/li><br>/g' | sed -E 's/^/<li>/g' >> $out
echo '</ol></body></html>' >> $out
cp $out $res
sleep 10s
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment