Skip to content

Instantly share code, notes, and snippets.

@bag-man
Last active May 23, 2021 17:22
Show Gist options
  • Save bag-man/75f8d5af9b0729ef3468cc1b4e2f7ea5 to your computer and use it in GitHub Desktop.
Save bag-man/75f8d5af9b0729ef3468cc1b4e2f7ea5 to your computer and use it in GitHub Desktop.
mrkoll.se scraper
SEEDPERSON=$1
rm output.csv
urls=$(curl -s "$SEEDPERSON" | grep /person/ | grep -v Sammanf | grep grannHeader | sed -r 's/.*href=(.*)>/\1/g' | sed 's/>.*$//' | sed 's/^\//https:\/\/mrkoll.se\//')
#echo "Url, First Name, Last Name, Address, Appt. No., Postcode, DoB, Gender, Res. Since, Phone" >> output.csv
for url in $urls
do
echo -n "$url," >> output.csv
curl -s "$url" | grep -E '(tel:|block_col1)' -A 47 | grep span | sed '/tel:/q' | sed -E 's/<div class=history_span/\n<div class=history_span/' | grep -vE '(history_span|f_head1|förnamn|f_line1|mellannamn)' | sed 's/<[^>]*>//g' | sed -E 's/ lgh ([0-9]{4})/\nlgh \1\n/' | sed s/-XXXX// | sed -r '/^\s*$/d' | sed 's/^herre/M/'| sed 's/^tjej/F/' | sed 's/^dam/F/' | sed 's/^kvinna/F/' | sed 's/^man/M/' | sed '6,7d;' | dos2unix | paste -s -d ',' >> output.csv
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment