Skip to content

Instantly share code, notes, and snippets.

@stefanschmidt
Last active August 29, 2015 14:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stefanschmidt/56c11f811d2243d61031 to your computer and use it in GitHub Desktop.
Save stefanschmidt/56c11f811d2243d61031 to your computer and use it in GitHub Desktop.
Scrape images of postings by category from a Wordpress blog
#!/bin/bash
# depends on cli-scrape (https://github.com/pthrasher/cli-scrape)
# and GNU Parallel (available via Homebrew)
for i in `seq 1 100`;
do
scrape http://foo.com/category/bar/page/$i/ '//img/@src' |
sed -En 's/(.*\.)(jpg|png|gif).*/\1\2/p' |
parallel --jobs 10 wget {}
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment