Skip to main content
Extract property URLs and data from Zillow listing pages using ZenRows’ Universal Scraper API. This tutorial covers setting up your scraper to handle Zillow’s anti-bot protection, extracting property links from search results, and processing dynamic content.

What you’ll learn

  • Set up scraping requests with anti-bot bypass and JavaScript rendering
  • Extract property URLs from Zillow listing pages using CSS selectors
  • Handle dynamic content loading and page scrolling
  • Configure custom JavaScript instructions for reliable data extraction

Why Scrape Real Estate Data

Real estate professionals need timely, comprehensive market data to make informed decisions. Web scraping enables automated data collection that provides several advantages:

Market intelligence

  • Monitor new listings as they appear on property websites
  • Track pricing trends across different neighborhoods and property types
  • Identify properties that match specific investment criteria

Investment analysis

  • Access comprehensive property data for market research
  • Compare property specifications across multiple listings
  • Analyze market conditions in target areas

Lead generation

  • Identify potential investment opportunities from listing data
  • Build databases of properties that meet client requirements
  • Monitor competitor listings and market activity

Step 1: Test With a Basic Scraping Setup

Start by creating a scraping function that handles Zillow’s anti-bot measures. The site requires JavaScript Rendering and Premium Proxies for reliable access.
Python
# pip install requests
import requests

def scraper(url):
    apikey = "YOUR_ZENROWS_API_KEY"

    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
    }
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.text
This function returns the website’s HTML content. The js_render parameter enables JavaScript processing, while premium_proxy provides residential IP addresses to avoid blocking. Setting proxy_country to “us” positions requests from IP addresses based in the US.
The proxy_country parameter is optional. If not specified, ZenRows will use a random IP address worldwide. See more about geolocation here.

Step 2: Handle Dynamic Content

Parameters, such as wait, wait_for, and js_instructions, allow customizing requests to handle dynamic rendering. Modify the request parameter with a 2-second generic delay (wait) to allow elements to load.
Python
#...
params = {
    "url": url,
    "apikey": apikey,
    "js_render": "true",
    "premium_proxy": "true",
    "proxy_country": "us",
    "wait": "2000",
}
#...
Add a scrolling logic using the js_instructions parameter. This helps capture property listings that extend beyond the viewport. Also define the js_instructions separately as a stringified parameter. The instructions include an extra wait command. This adds a specific delay for elements to load after the scrolling action has completed.
Python
# ...
import json


#... scraper function

# custom JS instructions
listing_js_instructions = json.dumps(
    [
        {"evaluate": "window.scrollTo(0, document.body.scrollHeight);"},
        {"wait": 2000},
    ]
)
Use the same scraper function to scrape the listings and individual property pages. To make the function more customizable for each scenario, update it to accept an optional js_instructions parameter. Here’s the updated code:
Python
# pip install requests
import requests
import json

def scraper(
    url,
    js_instructions=None,
):
    apikey = "YOUR_ZENROWS_API_KEY"
    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
        "js_instructions": js_instructions,
        "wait": "2000",
    }
    response = requests.get("https://api.zenrows.com/v1/", params=params)

    return response.text

# custom JS instructions
listing_js_instructions = json.dumps(
    [
        {"evaluate": "window.scrollTo(0, document.body.scrollHeight);"},
        {"wait": 2000},
    ]
)

Step 3: Extract Property URLs from Listing Pages

Now you’ll extract individual property page links from the Zillow listing. Use ZenRows’ css_extractor feature to automatically pull these URLs from the page. Here’s a stringified format of the css_extractor:
Python
# listing page CSS extractor
listing_css_extractor = json.dumps(
    { "Links": "a[data-testid*=carousel][href]@href" }
)
CSS selectors can change when websites update their code. To maintain a reliable scraper, monitor your selectors regularly and update them as needed. Learn more about CSS selectors here.
Add an optional css_extractor parameter to the scraper function and specify it in the ZenRows params. Since we’re extracting specific content, update the scraper function to return the data as JSON rather than plain text. Execute the function with the listing_url, listing_css_extractor, and listing_js_instructors as parameters. Here’s the updated code:
Python
# pip install requests
import requests
import json

def scraper(
    url,
    js_instructions=None,
    css_extractor=None,
):
    apikey = "YOUR_ZENROWS_API_KEY"
    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
        "js_instructions": js_instructions,
        "wait": "2000",
        "css_extractor": css_extractor,
    }
    response = requests.get("https://api.zenrows.com/v1/", params=params)

    return response.json()

# custom JS instructions
listing_js_instructions = json.dumps(
    [
        {"evaluate": "window.scrollTo(0, document.body.scrollHeight);"},
        {"wait": 2000},
    ]
)

# listing page CSS extractor
listing_css_extractor = json.dumps(
    { "Links": "a[data-testid*=carousel][href]@href" }
)

listing_url = "https://www.zillow.com/districts/8494/california-area-school-district/"

property_urls = scraper(
    listing_url,
    js_instructions=listing_js_instructions,
    css_extractor=listing_css_extractor,
)["Links"]

# use a set to avoid duplicate URLs
property_urls = set(property_urls)
print(property_urls)
The above returns the URLs of each property on the listing page. See a sample response below:
JSON Response
{
    "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/",
    "https://www.zillow.com/homedetails/882-Highpoint-Dr-Coal-Center-PA-15423/2087529083_zpid/",
    # ...,
    "https://www.zillow.com/homedetails/508-5th-St-California-PA-15419/49745256_zpid/",
    "https://www.zillow.com/homedetails/721-Spring-St-Roscoe-PA-15477/49794732_zpid/",
}
You now have a complete foundation for scraping Zillow property listings. Your scraper handles anti-bot protection, processes dynamic content, and extracts property URLs efficiently from any Zillow listing page.

Next Steps

Once you have the property URLs, you can use them to extract detailed property data from individual listing pages. Learn how to do this in our Extract Property Data tutorial.

Data management best practices

1

Structure for your use case

Design your data structure to match your specific business needs, rather than using a generic approach. For example, if you need to analyze price trends, structure price history as date/price pairs, not raw HTML. Map scraped fields directly to your business entities (e.g., property, agent, transaction) and use consistent, clear field names that work for your team and future use.
2

Validate before storage

Check for missing or malformed fields, unexpected data types, duplicate entries, or values outside the expected range. Use validation scripts or schema checks (e.g., using Python’s pydantic or JSON Schema). Automate this process to maintain consistency throughout your data extraction workflow.
3

Preserve raw data

Store both cleaned and raw data separately. Raw data serves as a backup for debugging and reprocessing when requirements change.
4

Determine the storage format

Choose the storage format that best fits your data and use case. Common options include:
  • JSON: Best for nested, hierarchical, or complex data structures. Human-readable and widely supported across platforms and protocols.
  • CSV: Ideal for flat, tabular data. Easy to use in spreadsheets and many analytics tools.
  • Databases (e.g., MongoDB, PostgreSQL): Suitable for large datasets that require frequent updates and querying.
  • Vector databases: Designed for storing vectorized data, such as embeddings for LLM (Large Language Model) consumption.
Select the format that aligns with your workflow and future data needs.

Troubleshooting

Missing data

Solution 1: Employ adequate delay strategies to allow dynamic content to load. These include generic waits, waiting for specific elements, or pausing after scrolling or navigation. Solution 2: If using css_extractor, check and ensure you’ve used the correct CSS selectors. Test each selector using the ZenRows Request Builder before integrating it into your codebase. Create an DOM monitoring strategy to spot site structural changes. Isolate selectors from your codebase for easy debugging and troubleshooting.

CAPTCHA/anti-bot challenges

Solution 1: Ensure you’ve applied anti-bot bypass parameters like js_render and premium_proxy. Solution 2: If you continue to be blocked by in-page CAPTCHAs or those attached to form fields, easily integrate a CAPTCHA-solving service like 2Captcha from our solver integration options. Check our 2Captcha integration guide for more information. Solution 3: Use fallbacks or alternative pathways to avoid abrupt scraping failures.

Rate limiting/geo-blocking

Solution 1: Ensure you use the premium_proxy parameter to automatically switch IPs. Solution 2: Use request retry mechanisms, such as exponential backoff delay and retries between failed requests.