Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Python 3 script to find real estate listings of properties up for sale on zillow.com
from lxml import html
import requests
import unicodecsv as csv
import argparse
def parse(zipcode,filter=None):
if filter=="newest":
url = "https://www.zillow.com/homes/for_sale/{0}/0_singlestory/days_sort".format(zipcode)
elif filter == "cheapest":
url = "https://www.zillow.com/homes/for_sale/{0}/0_singlestory/pricea_sort/".format(zipcode)
else:
url = "https://www.zillow.com/homes/for_sale/{0}_rb/?fromHomePage=true&shouldFireSellPageImplicitClaimGA=false&fromHomePageTab=buy".format(zipcode)
for i in range(5):
# try:
headers= {
'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'accept-encoding':'gzip, deflate, sdch, br',
'accept-language':'en-GB,en;q=0.8,en-US;q=0.6,ml;q=0.4',
'cache-control':'max-age=0',
'upgrade-insecure-requests':'1',
'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}
response = requests.get(url,headers=headers)
print(response.status_code)
parser = html.fromstring(response.text)
search_results = parser.xpath("//div[@id='search-results']//article")
properties_list = []
for properties in search_results:
raw_address = properties.xpath(".//span[@itemprop='address']//span[@itemprop='streetAddress']//text()")
raw_city = properties.xpath(".//span[@itemprop='address']//span[@itemprop='addressLocality']//text()")
raw_state= properties.xpath(".//span[@itemprop='address']//span[@itemprop='addressRegion']//text()")
raw_postal_code= properties.xpath(".//span[@itemprop='address']//span[@itemprop='postalCode']//text()")
raw_price = properties.xpath(".//span[@class='zsg-photo-card-price']//text()")
raw_info = properties.xpath(".//span[@class='zsg-photo-card-info']//text()")
raw_broker_name = properties.xpath(".//span[@class='zsg-photo-card-broker-name']//text()")
url = properties.xpath(".//a[contains(@class,'overlay-link')]/@href")
raw_title = properties.xpath(".//h4//text()")
address = ' '.join(' '.join(raw_address).split()) if raw_address else None
city = ''.join(raw_city).strip() if raw_city else None
state = ''.join(raw_state).strip() if raw_state else None
postal_code = ''.join(raw_postal_code).strip() if raw_postal_code else None
price = ''.join(raw_price).strip() if raw_price else None
info = ' '.join(' '.join(raw_info).split()).replace(u"\xb7",',')
broker = ''.join(raw_broker_name).strip() if raw_broker_name else None
title = ''.join(raw_title) if raw_title else None
property_url = "https://www.zillow.com"+url[0] if url else None
is_forsale = properties.xpath('.//span[@class="zsg-icon-for-sale"]')
properties = {
'address':address,
'city':city,
'state':state,
'postal_code':postal_code,
'price':price,
'facts and features':info,
'real estate provider':broker,
'url':property_url,
'title':title
}
if is_forsale:
properties_list.append(properties)
return properties_list
# except:
# print ("Failed to process the page",url)
if __name__=="__main__":
argparser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)
argparser.add_argument('zipcode',help = '')
sortorder_help = """
available sort orders are :
newest : Latest property details,
cheapest : Properties with cheapest price
"""
argparser.add_argument('sort',nargs='?',help = sortorder_help,default ='Homes For You')
args = argparser.parse_args()
zipcode = args.zipcode
sort = args.sort
print ("Fetching data for %s"%(zipcode))
scraped_data = parse(zipcode,sort)
print ("Writing data to output file")
with open("properties-%s.csv"%(zipcode),'wb')as csvfile:
fieldnames = ['title','address','city','state','postal_code','price','facts and features','real estate provider','url']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in scraped_data:
writer.writerow(row)
@edwardrusch3

This comment has been minimized.

Copy link

edwardrusch3 commented Jun 28, 2018

I continue to get blank data returned in the csv even though the script is giving a 200? Any idea?

@salimoha

This comment has been minimized.

Copy link

salimoha commented Aug 11, 2018

Thanks for your code. I have two questions: 1) how can I get the zestimte price? 2) what is "url = "https://www.zillow.com/homes/for_sale/{0}_rb/?fromHomePage=true&shouldFireSellPageImplicitClaimGA=false&fromHomePageTab=buy".format(zipcode)
"
I want to get the list of houses from loopnet. What should I put instead of the above url?
Thanks

@NovTangoPapa

This comment has been minimized.

Copy link

NovTangoPapa commented Sep 27, 2018

I continue to get blank data returned in the csv even though the script is giving a 200? Any idea?

I am having this same issue.

Edit:

It looks like it has to do with when you try to use a parsing argument (newest/cheapest). I am guessing that the associated links are bad.

Another Edit:

Doesn't look like the links.

@gumdropsteve

This comment has been minimized.

Copy link

gumdropsteve commented Oct 2, 2018

Hi, newer to Python, active Realtor® in SF Bay Area. End goal is to have this running on my site, but starting with it as is to better understand.

What would I need to change to have this execute in PyCharm (Windows 10)? Thanks.

@andresmillang

This comment has been minimized.

Copy link

andresmillang commented Nov 15, 2018

Hi, reason for the blank data is recaptcha

@keddisa

This comment has been minimized.

Copy link

keddisa commented Nov 16, 2018

I keep getting this error message, can someone help?

usage: ipykernel_launcher.py [-h] zipcode [sort]
ipykernel_launcher.py: error: unrecognized arguments: -f

An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

C:\Users\keddi\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2969: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

@guyamir

This comment has been minimized.

Copy link

guyamir commented Dec 23, 2018

I keep getting this error message, can someone help?

usage: ipykernel_launcher.py [-h] zipcode [sort]
ipykernel_launcher.py: error: unrecognized arguments: -f

An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

C:\Users\keddi\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2969: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

save the file and run it from the terminal with a zipcode. For instance:

~$ python3 zillow.py 10118

@wwolfgang11

This comment has been minimized.

Copy link

wwolfgang11 commented Jan 2, 2019

I get the error "ZillowError: HTTPSConnectionPool(host='www.zillow.com', port=443): Max retries exceeded with url: /webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gtmiat11xn_7ew1d&address=3400+Pacific+Ave.%2C+Marina+Del+Rey%2C+CA&citystatezip=90292 (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))"
any idea how to fix this?

@philxiao

This comment has been minimized.

Copy link

philxiao commented Jan 4, 2019

This would generate empty result as-is due to Zillow's implementation of recaptcha. For people who would like to use this script, I suggest you use the crawl proxy to bypass recaptcha. https://proxycrawl.com/ Once you have that, simply prefix the zillow url with the proxycrawl endpoint provided in your dashboard.

@corpulent

This comment has been minimized.

Copy link

corpulent commented Jan 14, 2019

Hi, newer to Python, active Realtor® in SF Bay Area. End goal is to have this running on my site, but starting with it as is to better understand.

What would I need to change to have this execute in PyCharm (Windows 10)? Thanks.

@gumdropsteve are you a realtor? I might be able to help.

@sarojrout

This comment has been minimized.

Copy link

sarojrout commented Feb 12, 2019

I continue to get blank data returned in the csv even though the script is giving a 200? Any idea?

I am also getting blank. did you get any solution?

@sebasslash

This comment has been minimized.

Copy link

sebasslash commented Mar 14, 2019

It works for me if you don't pass the second argument. So:

python main.py zipcode

Seems like the second argument is causing the scraper to yield no results...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.