Extract Property Data

Extract detailed property information from individual Zillow property pages using ZenRows’ Universal Scraper API. This tutorial shows you how to scrape specific data fields from single properties and scale up to process multiple listings efficiently.

What you’ll learn

Extract specific property data using CSS selectors
Set up scraping requests with anti-bot bypass and JavaScript rendering
Clean and structure scraped data for analysis
Handle complex data fields like price history
Scale from single property to multiple property extraction
Store property data in structured formats

Why Extract Property Data

Real estate professionals need detailed property information for market analysis, investment decisions, and lead generation. Extracting structured data from property listings enables:

Market analysis

Compare property specifications across multiple listings
Track pricing trends and market conditions
Identify investment opportunities based on specific criteria

Investment research

Access comprehensive property details for due diligence
Analyze property features, pricing history, and market positioning
Build databases for portfolio management

Automated workflows

Integrate property data into existing business systems
Create alerts for properties matching specific criteria
Generate reports and analytics from structured data

Step 1: Set Up Your Basic Scraping Function

Start by creating a scraping function that handles Zillow’s anti-bot measures. You’ll need JavaScript Rendering to process dynamic content and Premium Proxies to avoid IP blocking.

# pip install requests
import requests

def scraper(url):
    apikey = "YOUR_ZENROWS_API_KEY"

    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",  # Processes JavaScript-generated content
        "premium_proxy": "true",  # Uses residential IP addresses
        "proxy_country": "us",  # Routes requests through US-based IPs
        "wait": "2000",  # Wait 2 seconds for elements to load
    }
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.text

This function returns the website’s HTML content. The js_render parameter enables JavaScript processing (essential for dynamic websites like Zillow), while premium_proxy provides residential IP addresses that appear as regular user traffic.

The proxy_country parameter is optional. If you don’t specify it, ZenRows will use a random IP address from anywhere in the world. Learn more about geolocation here.

Step 2: Test With a Single Property URL

Before scaling to multiple properties, test your scraper with a single Zillow property page. Use ZenRows’ css_extractor feature to automatically extract specific data fields. Create CSS selectors to target the property data you need:

Python

import json

# Property page CSS extractor
property_css_extractor = json.dumps(
    {
        "price": "span[data-testid='price']",
        "location": "ul.footer-breadcrumbs a, ul.footer-breadcrumbs strong",
        "dimension": "div[data-testid='bed-bath-sqft-facts']",
        "description": "div[data-testid='description']",
        "listed_by": "div[data-testid='seller-attribution']",
        "price_change_dates": "span[data-testid='date-info']",
        "price_changes": "td[data-testid='price-money-cell']",
    }
)

CSS selectors can change when websites update their code. To maintain a reliable scraper, monitor your selectors regularly and update them as needed. Learn more about CSS selectors here.

Update your scraper function to handle CSS extraction and return JSON data:

Python

# pip install requests
import requests
import json

def scraper(url, css_extractor=None):
    apikey = "YOUR_ZENROWS_API_KEY"
    
    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
        "wait": "2000",
        "css_extractor": css_extractor,
    }
    
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.json()  # Return JSON when using CSS extractor

# Property page CSS extractor
property_css_extractor = json.dumps(
    {
        "price": "span[data-testid='price']",
        "location": "ul.footer-breadcrumbs a, ul.footer-breadcrumbs strong",
        "dimension": "div[data-testid='bed-bath-sqft-facts']",
        "description": "div[data-testid='description']",
        "listed_by": "div[data-testid='seller-attribution']",
        "price_change_dates": "span[data-testid='date-info']",
        "price_changes": "td[data-testid='price-money-cell']",
    }
)

# Test with a single property
property_url = "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/"

property_data = scraper(
    property_url,
    css_extractor=property_css_extractor,
)

print(json.dumps(property_data, indent=2))

This returns structured data for the property. Here’s what a typical response looks like:

JSON Response

{
    "price": "$142,500",
    "location": "130 3rd St, California, PA 15419",
    "dimension": "4 beds • 2 baths • 2,764 sqft",
    "description": "Sprawling single family home...only missing you!",
    "listed_by": "Listed by: Thomas Althoff 724-933-6300, RE/MAX SELECT REALTY",
    "price_change_dates": ["7/29/2025", "Listed for sale"],
    "price_changes": ["$142,500+9.7%$52/sqft"]
}

Step 3: Scale to Multiple Properties

Once your single property extraction works reliably, scale up to process multiple property URLs. Here are some example Zillow property URLs you can use for testing:

Python

# Example property URLs from Zillow
property_urls = [
    "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/",
    "https://www.zillow.com/homedetails/882-Highpoint-Dr-Coal-Center-PA-15423/2087529083_zpid/",
    "https://www.zillow.com/homedetails/508-5th-St-California-PA-15419/49745256_zpid/",
    "https://www.zillow.com/homedetails/721-Spring-St-Roscoe-PA-15477/49794732_zpid/"
]

all_property_data = []

# Extract data from each property
for property_url in property_urls:
    try:
        properties = scraper(
            property_url,
            css_extractor=property_css_extractor,
        )
        
        # Clean and structure the data
        cleaned_properties = clean_property_data(properties)
        
        # Add the URL for reference
        cleaned_properties["source_url"] = property_url
        
        all_property_data.append(cleaned_properties)
        
        print(f"Successfully extracted data from: {property_url}")
        
    except Exception as e:
        print(f"Error extracting data from {property_url}: {str(e)}")
        continue

print(f"Successfully extracted data for {len(all_property_data)} properties")

Save the property data in JSON format to preserve the nested structure:

Python

# Store the data as JSON
with open("properties.json", "w", encoding="utf-8") as f:
    json.dump(all_property_data, f, ensure_ascii=False, indent=2)

print(f"Data saved to properties.json")

For flat data structures, you can also export to CSV:

Python

import pandas as pd

# Convert to DataFrame and save as CSV (flattens nested data)
df = pd.json_normalize(all_property_data)
df.to_csv("properties.csv", index=False)

print("Data saved to properties.csv")

Step 4: Complete Implementation

Here’s the complete code that extracts property data from multiple URLs:

Python

# pip install requests pandas
import requests
import json
import pandas as pd

def scraper(url, css_extractor=None):
    apikey = "YOUR_ZENROWS_API_KEY"
    
    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
        "wait": "2000",
        "css_extractor": css_extractor,
    }
    
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.json()

def clean_property_data(properties):
    """Clean and structure property data"""
    
    # Clean the "listed_by" field
    listed_by = properties.get("listed_by")
    if listed_by:
        properties["listed_by"] = listed_by.replace("Listed by:", "").strip()
    
    # Process and merge price history data
    price_history_dates = properties.get("price_change_dates")
    price_changes = properties.get("price_changes")

    if price_history_dates and price_changes:
        combined_price_history = []
        
        for i in range(0, len(price_history_dates), 2):
            date = price_history_dates[i]
            event = (
                price_history_dates[i + 1] if i + 1 < len(price_history_dates) else ""
            )
            price = price_changes[i // 2] if i // 2 < len(price_changes) else ""
            
            combined_price_history.append({
                "Date": date,
                "Event": event,
                "Price": price
            })
        
        properties["price_history"] = combined_price_history

        # Remove the original fields after combining
        for key in ["price_change_dates", "price_changes"]:
            if key in properties:
                del properties[key]
    
    return properties

# CSS extractor for property data
property_css_extractor = json.dumps(
    {
        "price": "span[data-testid='price']",
        "location": "ul.footer-breadcrumbs a, ul.footer-breadcrumbs strong",
        "dimension": "div[data-testid='bed-bath-sqft-facts']",
        "description": "div[data-testid='description']",
        "listed_by": "div[data-testid='seller-attribution']",
        "price_change_dates": "span[data-testid='date-info']",
        "price_changes": "td[data-testid='price-money-cell']",
    }
)

# Example property URLs
property_urls = [
    "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/",
    "https://www.zillow.com/homedetails/882-Highpoint-Dr-Coal-Center-PA-15423/2087529083_zpid/",
    "https://www.zillow.com/homedetails/508-5th-St-California-PA-15419/49745256_zpid/",
    "https://www.zillow.com/homedetails/721-Spring-St-Roscoe-PA-15477/49794732_zpid/"
]

all_property_data = []

# Extract data from each property
for property_url in property_urls:
    try:
        properties = scraper(
            property_url,
            css_extractor=property_css_extractor,
        )
        
        # Clean and structure the data
        cleaned_properties = clean_property_data(properties)
        
        # Add the URL for reference
        cleaned_properties["source_url"] = property_url
        
        all_property_data.append(cleaned_properties)
        
        print(f"✓ Successfully extracted data from: {property_url}")
        
    except Exception as e:
        print(f"✗ Error extracting data from {property_url}: {str(e)}")
        continue

# Save the data
with open("properties.json", "w", encoding="utf-8") as f:
    json.dump(all_property_data, f, ensure_ascii=False, indent=2)

# Also save as CSV for easy analysis
df = pd.json_normalize(all_property_data)
df.to_csv("properties.csv", index=False)

print(f"\n🎉 Successfully extracted data for {len(all_property_data)} properties")
print("Data saved to properties.json and properties.csv")

Congratulations! 🎉 You’ve successfully extracted detailed property data from Zillow using ZenRows’ web scraping capabilities.

Data Management Best Practices

Structure for your use case

Design your data structure to match your specific business needs rather than using a generic approach. For example, if you need to analyze price trends, structure price history as date/price pairs, not raw HTML. Map scraped fields directly to your business entities (property, agent, transaction) and use consistent, clear field names that work for your team.

Validate before storage

Check for missing or malformed fields, unexpected data types, duplicate entries, and values outside the expected range. Use validation scripts or schema checks (such as Python’s pydantic or JSON Schema). Automate this process to maintain consistency throughout your data extraction workflow.

Preserve raw data

Store both cleaned and raw data separately. Raw data serves as a backup for debugging and reprocessing when requirements change.

Choose the right storage format

Select the storage format that best fits your data and use case:

JSON: Best for nested, hierarchical, or complex data structures. Human-readable and widely supported across platforms.
CSV: Ideal for flat, tabular data. Easy to use in spreadsheets and analytics tools.
Databases (MongoDB, PostgreSQL): Suitable for large datasets that require frequent updates and querying.
Vector databases: Designed for storing vectorized data, such as embeddings for Large Language Model consumption.

Troubleshooting

Missing data

Solution 1: Use adequate delay strategies to allow dynamic content to load. Increase the wait parameter value or add specific waits for elements that load asynchronously. Solution 2: Verify you’re using the correct CSS selectors. Test each selector using the ZenRows Request Builder before integrating it into your code. Create a monitoring strategy to spot site structural changes and isolate selectors from your codebase for easy debugging.

CAPTCHA and anti-bot challenges

Solution 1: Ensure you’re using anti-bot bypass parameters like js_render and premium_proxy. Solution 2: If you continue encountering CAPTCHAs, integrate a CAPTCHA-solving service like 2Captcha through our solver integration options. Check our 2Captcha integration guide for implementation details. Solution 3: Use fallback mechanisms or alternative pathways to prevent scraping failures.

Rate limiting and geo-blocking

Solution 1: Use the premium_proxy parameter to automatically rotate IP addresses. Solution 2: Implement request retry mechanisms with exponential backoff delays between failed requests. Solution 3: Add delays between requests to avoid overwhelming the target website.

Monitoring & Tracking

Real Estate

General Topics

What you’ll learn

Why Extract Property Data

Market analysis

Investment research

Automated workflows

Step 1: Set Up Your Basic Scraping Function

Step 2: Test With a Single Property URL

Step 3: Scale to Multiple Properties

Step 4: Complete Implementation

Data Management Best Practices

Structure for your use case

Validate before storage

Preserve raw data

Choose the right storage format

Troubleshooting

Missing data

CAPTCHA and anti-bot challenges

Rate limiting and geo-blocking

Monitoring & Tracking

Real Estate

General Topics

​What you’ll learn

​Why Extract Property Data

​Market analysis

​Investment research

​Automated workflows

​Step 1: Set Up Your Basic Scraping Function

​Step 2: Test With a Single Property URL

​Step 3: Scale to Multiple Properties

​Step 4: Complete Implementation

​Data Management Best Practices

​Structure for your use case

​Validate before storage

​Preserve raw data

​Choose the right storage format

​Troubleshooting

​Missing data

​CAPTCHA and anti-bot challenges

​Rate limiting and geo-blocking

What you’ll learn

Why Extract Property Data

Market analysis

Investment research

Automated workflows

Step 1: Set Up Your Basic Scraping Function

Step 2: Test With a Single Property URL

Step 3: Scale to Multiple Properties

Step 4: Complete Implementation

Data Management Best Practices

Structure for your use case

Validate before storage

Preserve raw data

Choose the right storage format

Troubleshooting

Missing data

CAPTCHA and anti-bot challenges

Rate limiting and geo-blocking