Skip to content

Instantly share code, notes, and snippets.

@tott
Created October 13, 2016 01:59
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save tott/2fb5a32786542c3b423faf007b085d74 to your computer and use it in GitHub Desktop.
Prime caches
#!/bin/bash
# ./curl-urls.sh numprocessses http://urltograp urlpattern
# ./curl-urls.sh 3 http://www.domain.com domain.com
numprocessses=$1
baseurl=$2
urlpattern=$3
function forky() {
local num_par_procs
if [[ -z $1 ]] ; then
num_par_procs=3
else
num_par_procs=$1
fi
while [[ $(jobs | wc -l) -ge $num_par_procs ]] ; do
sleep 1
done
}
curl -s $baseurl -O /tmp/baseurl.html
sed -n 's/.*href="\([^"]*\).*/\1/p' /tmp/baseurl.html | grep -E "$urlpattern" | sort | uniq > /tmp/urls.txt
# below is an example - make sure to change this to your needs
for url in `cat /tmp/urls.txt`; do
echo "`jobs | wc -l` jobs in spool"
echo "Grabbing $url"
curl -s "$url" > /dev/null &
forky $numprocessses
done
wait
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment