Skip to content

Instantly share code, notes, and snippets.

@troy
Created September 10, 2009 13:22
Show Gist options
  • Save troy/184547 to your computer and use it in GitHub Desktop.
Save troy/184547 to your computer and use it in GitHub Desktop.
Given a redfin.com house listing URL, save all full-size images
# usage: redfin-images "http://www.redfin.com/WA/Seattle/123-Home-Row-12345/home/1234567"
function redfin-images() {
wget -O - $1 | grep "full:" | awk -F \" '{print $4}' | xargs wget -
}
wget -O - http://www.redfin.com/WA/Seattle/123-Home-Row-12345/home/1234567 | grep "full:" | awk -F \" '{print $4}' | xargs wget -
@thn929
Copy link

thn929 commented Jul 15, 2019

I've updated the script to match current Redfin, as of 7/15/2019 (requires bash 4.2+ for echo -e a la https://stackoverflow.com/a/8795949):

wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | echo -e $(egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[0-9_]*.jpg") | xargs wget --user-agent="Mozilla"

@thn929
Copy link

thn929 commented Jul 16, 2019

Got a version working with bash pre-4.2 (ie. Mac OS X Mojave comes with bash 3.2.57(1)-release):

  1. brew install ascii2uni

  2. wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[0-9_]*.jpg" | ascii2uni -Z '\u%04X' | xargs wget --user-agent="Mozilla"
    (Unicode codepoint is tricky to work with in older versions of bash)

But really, just upgrade your Mac's bash.

@andrewkincaid
Copy link

I just used the most recent command and it did not work. For the listing I'm using, all the images start with PW so I had to change the regex to [0-9_PW]*.jpg and that worked. Thank you!!!

@Andrew-J-Marsh
Copy link

I found that the latest jpegs were prefixed with "VALO", my regex is rusty because when I tried wildcarding A-Z it wouldn't pull them. See where I hardcoded VALO below, and be aware that you may need to tweak that should the prefix for jpegs change

bash-5.0$ wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | echo -e $(egrep -o "https:\\u002F\\u002Fssl.cdn-redfin.com\\u002Fphoto\\u002F\d*\\u002Fbigphoto\\u002F\d*\\u002F[VALO][0-9_].jpg") | xargs wget --user-agent="Mozilla"

@stiwari3
Copy link

stiwari3 commented Jul 4, 2020

The following worked for me
wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | echo -e $(egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[A-Z0-9_]*.jpg") | xargs wget --user-agent="Chrome"

@inboundnebula
Copy link

This solution does not work for me, though at first it looked as it would: "Connecting to www.redfin.com..." -> "HTTP request sent, awaiting response... 200 OK" -> "Length: unspecified [text/html]" -> "Saving to: 'STDOUT'" -> "written to STDOUT"... but then get a "wget: missing URL" message.

Any thoughts on how to solve?

Not a dev, just a curiosity-driven individual trying to learn. In retrospect, it would have been faster (for me) to right click each photo and save, but there would not have been any fun!

The following worked for me
wget --user-agent="Mozilla" -O - https://www.redfin.com/IL/Chicago/123-S-Someplace-St-60605/unit-123/home/12345678 | echo -e $(egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[A-Z0-9_]*.jpg") | xargs wget --user-agent="Chrome"

@Andrew-J-Marsh
Copy link

wget: missing URL = you didn't provide a valid url. Navigate to the listing via redfin, copy the entire url, use it in place of the dummy url in example. Enjoy!

@inboundnebula
Copy link

Hi Andrew, yes, I originally used a valid Redfin URL with all components; IE: https://www.redfin.com/State/City/StreetAddress-Zip/home/ID. Yet, it still gave me a "wget: missing URL" message.

I will try a different listing and see if it replicates.

I even changed useragent to Mozilla or Chrome, as well as the "-" between "-0 - https:..." to no avail.

wget: missing URL = you didn't provide a valid url. Navigate to the listing via redfin, copy the entire url, use it in place of the dummy url in example. Enjoy!

@mals14
Copy link

mals14 commented Sep 1, 2020

does this still work?

@troy
Copy link
Author

troy commented Sep 1, 2020

@mals14: I haven't used it in over 10 years, so I have no idea. Based on the fact that people have commenting in the last few months stating that it works, I'm guessing that it does. Try it and see :)

@mals14
Copy link

mals14 commented Sep 1, 2020

@troy - thank you for replying. It does not work anymore, I guess because the website can sense it is wget request and does not respond well.

I am not sure if all the others get the notification or not.

For now, I copied the displayed page on realtor.com I believe, and then pasted it as markdown, and then used a python script from GitHub contributor to download the image files. Quite a round about solution but worked.

@punjabdhaputar
Copy link

Following worked for me on my mac
1) brew install uni2ascii
2) wget --user-agent="Mozilla" -O - <RedFinURL> | egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[A-Z0-9_]*.jpg" | ascii2uni -Z '\u%04X' | xargs wget --user-agent="Mozilla"

@mals14
Copy link

mals14 commented Oct 22, 2020

@punjabdhaputar Thank you for sharing. It works!

Was able to understand how it works by first saving wget output, then found the format that egrep is looking for in that document, and ascii2uni is changing the format back to something that wget can use. Good stuff and thanks again for sharing!

@msridhar
Copy link

@punjabdhaputar worked for me too, thanks!!

@gauravchak
Copy link

Any idea how to download the photos that are visible after signing in?

@troy
Copy link
Author

troy commented May 13, 2021

@gauravchak: After using a browser to log in, you might be able to change wget to present session cookies from the browser. For that, look into the --load-cookies option. You'd need to manually create the cookie file.

Assuming you're just saving images from a handful of listings for personal use (which is what this script was intended for), one of these methods might be easier than adding cookie support:

  1. In Firefox, choose Tools -> Page Info, select the Media tab, highlight multiple image URLs in the listing, and click "Save as." This probably won't show high-res images that are only shown in an interactive gallery (lightbox), but it will at least show the average-size images. If you need high-res images, you can probably find a different real estate site that does show all of the high-res images in one page and use the same technique.
  2. Use a "Save all images" browser extensions (example: https://github.com/belaviyo/save-images - I haven't personally used it). Browser extensions are risky, so look for a trusted one with lots of users and comments (and ideally, public source code), and uninstall it as soon as you're done.

@VYCMa
Copy link

VYCMa commented Dec 13, 2021

It was working well earlier this year, but now it just downloads a single image. Does anyone know how to adjust it to download all images? Thanks

@PratapNaik
Copy link

@punjabdhaputar ...this still works! thanks!

@reaudiotra
Copy link

@punjabdhaputar just tried this and was able to save myself several right click-save trouble...thanks!

@Aleyasen
Copy link

It still works.

@maximusdecimus12
Copy link

maximusdecimus12 commented Mar 13, 2022

@gauravchak managed to make it work for listings that require signing in, by using the method outlined here: How do I use wget/curl to download from a site I am logged into?.

  1. Logged into Redfin in Firefox.
  2. Open "Network" tab of Web Developer tool: Ctrl-Shift-E
  3. I took the very first request that was sent when I refreshed the screen on Firefox.
  4. Pasted it in Sublime, and saw a large amount of cookie values in there. To figure out where cookies started and stopped, I just searched for "-H" in the file and took what was there for cookies only. I took everything that was in between 'Cookie: key1=value1; key2=value2; [....]; keyn=valuen'
  5. Recreated the wget command as such:
    wget --no-cookies --header "Cookie: key1=value1; key2=value2; [....]; keyn=valuen" --user-agent="Mozilla" -O - <RedFinURL> | egrep -o "https:\\\\u002F\\\\u002Fssl.cdn-redfin.com\\\\u002Fphoto\\\\u002F\d*\\\\u002Fbigphoto\\\\u002F\d*\\\\u002F[A-Z0-9_]*.jpg" | ascii2uni -Z '\u%04X' | xargs wget --user-agent="Mozilla"

And that did the trick. Hope that helps.

@DarkAlexWang
Copy link

@punjabdhaputar Thanks, it works for me.

@polygonsheep
Copy link

Redfin seems to be blocking this now, getting 403 Forbiden

@timendez
Copy link

timendez commented Jan 26, 2024

I created a lil Go program to do this https://github.com/timendez/go-redfin-archiver

Clone repo, and just run e.g. go run archive.go https://www.redfin.com/CA/San-Jose/206-Grayson-Ter-95126/home/2122534

@troy
Copy link
Author

troy commented Jan 26, 2024

@timendez I just tried your Go program and it worked great. Nice work!

For anyone else who encounters this gist: Strongly consider using @timendez's program instead: https://github.com/timendez/go-redfin-archiver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment