Created
August 6, 2014 16:37
-
-
Save n8henrie/a9d1b119515586981e8b to your computer and use it in GitHub Desktop.
Wget script to download a bunch of pics from subpages all linked from a single page
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wget -qO- http://site_in_question.html | ack -o "pattern_of_subpages\.html" | xargs -I{} bash -c 'wget -qO- "http://subpage_prefix/"{} | ack -o "(?<=src\=\").*?image_direct_link_pattern\.jpg" | (read img_link; wget "image_direct_link_subpattern/$img_link")' | |
# Broken down: | |
# wget -qO- http://site_in_question.html # wget site in question, redirects to stdout | |
# ack -o "pattern_of_subpages\.html" # searches stdin for list of links to subpages | |
# xargs -I{} bash -c # runs the following on each of those subpages | |
# wget -qO- "http://subpage_prefix/"{} # wget each subpage with {} filling in the pattern identified by ack | |
# ack -o "(?<=src\=\").*?image_direct_link_pattern\.jpg # searches the subpage for the direct link to the image | |
# (read img_link; wget "image_direct_link_subpattern/$img_link") # assigns the image pattern to a variable and wgets it |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment