> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zenrows.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract Property Data

> Scrape detailed property data from individual Zillow listing pages using ZenRows CSS selectors and scale to process multiple properties.

Extract detailed property information from individual Zillow property pages using ZenRows' Universal Scraper API. This tutorial shows you how to scrape specific data fields from single properties and scale up to process multiple listings efficiently.

## What you'll learn

* Extract specific property data using CSS selectors
* Set up scraping requests with anti-bot bypass and JavaScript rendering
* Clean and structure scraped data for analysis
* Handle complex data fields like price history
* Scale from single property to multiple property extraction
* Store property data in structured formats

## Why Extract Property Data

Real estate professionals need detailed property information for market analysis, investment decisions, and lead generation. Extracting structured data from property listings enables:

### Market analysis

* Compare property specifications across multiple listings
* Track pricing trends and market conditions
* Identify investment opportunities based on specific criteria

### Investment research

* Access comprehensive property details for due diligence
* Analyze property features, pricing history, and market positioning
* Build databases for portfolio management

### Automated workflows

* Integrate property data into existing business systems
* Create alerts for properties matching specific criteria
* Generate reports and analytics from structured data

## Step 1: Set Up Your Basic Scraping Function

Start by creating a scraping function that handles Zillow's anti-bot measures. You'll need [JavaScript Rendering](/universal-scraper-api/features/js-rendering) to process dynamic content and [Premium Proxies](/universal-scraper-api/features/premium-proxy) to avoid IP blocking.

```python theme={null}
# pip install requests
import requests

def scraper(url):
    apikey = "YOUR_ZENROWS_API_KEY"

    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",  # Processes JavaScript-generated content
        "premium_proxy": "true",  # Uses residential IP addresses
        "proxy_country": "us",  # Routes requests through US-based IPs
        "wait": "2000",  # Wait 2 seconds for elements to load
    }
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.text
```

This function returns the website's HTML content. The js\_render parameter enables JavaScript processing (essential for dynamic websites like Zillow), while premium\_proxy provides residential IP addresses that appear as regular user traffic.

<Tip>
  The `proxy_country` parameter is optional. If you don't specify it, ZenRows will use a random IP address from anywhere in the world. Learn more about geolocation [here](/universal-scraper-api/features/proxy-country).
</Tip>

## Step 2: Test With a Single Property URL

Before scaling to multiple properties, test your scraper with a single Zillow property page. Use ZenRows' `css_extractor` feature to automatically extract specific data fields.

Create CSS selectors to target the property data you need:

```python Python theme={null}
import json

# Property page CSS extractor
property_css_extractor = json.dumps(
    {
        "price": "span[data-testid='price']",
        "location": "ul.footer-breadcrumbs a, ul.footer-breadcrumbs strong",
        "dimension": "div[data-testid='bed-bath-sqft-facts']",
        "description": "div[data-testid='description']",
        "listed_by": "div[data-testid='seller-attribution']",
        "price_change_dates": "span[data-testid='date-info']",
        "price_changes": "td[data-testid='price-money-cell']",
    }
)
```

<Warning>CSS selectors can change when websites update their code. To maintain a reliable scraper, monitor your selectors regularly and update them as needed. Learn more about CSS selectors [here](/universal-scraper-api/features/css-extractor).</Warning>

Update your scraper function to handle CSS extraction and return JSON data:

```python Python theme={null}
# pip install requests
import requests
import json

def scraper(url, css_extractor=None):
    apikey = "YOUR_ZENROWS_API_KEY"
    
    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
        "wait": "2000",
        "css_extractor": css_extractor,
    }
    
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.json()  # Return JSON when using CSS extractor

# Property page CSS extractor
property_css_extractor = json.dumps(
    {
        "price": "span[data-testid='price']",
        "location": "ul.footer-breadcrumbs a, ul.footer-breadcrumbs strong",
        "dimension": "div[data-testid='bed-bath-sqft-facts']",
        "description": "div[data-testid='description']",
        "listed_by": "div[data-testid='seller-attribution']",
        "price_change_dates": "span[data-testid='date-info']",
        "price_changes": "td[data-testid='price-money-cell']",
    }
)

# Test with a single property
property_url = "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/"

property_data = scraper(
    property_url,
    css_extractor=property_css_extractor,
)

print(json.dumps(property_data, indent=2))
```

This returns structured data for the property. Here's what a typical response looks like:

```json JSON Response theme={null}
{
    "price": "$142,500",
    "location": "130 3rd St, California, PA 15419",
    "dimension": "4 beds • 2 baths • 2,764 sqft",
    "description": "Sprawling single family home...only missing you!",
    "listed_by": "Listed by: Thomas Althoff 724-933-6300, RE/MAX SELECT REALTY",
    "price_change_dates": ["7/29/2025", "Listed for sale"],
    "price_changes": ["$142,500+9.7%$52/sqft"]
}
```

## Step 3: Scale to Multiple Properties

Once your single property extraction works reliably, scale up to process multiple property URLs. Here are some example Zillow property URLs you can use for testing:

```python Python theme={null}
# Example property URLs from Zillow
property_urls = [
    "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/",
    "https://www.zillow.com/homedetails/882-Highpoint-Dr-Coal-Center-PA-15423/2087529083_zpid/",
    "https://www.zillow.com/homedetails/508-5th-St-California-PA-15419/49745256_zpid/",
    "https://www.zillow.com/homedetails/721-Spring-St-Roscoe-PA-15477/49794732_zpid/"
]

all_property_data = []

# Extract data from each property
for property_url in property_urls:
    try:
        properties = scraper(
            property_url,
            css_extractor=property_css_extractor,
        )
        
        # Clean and structure the data
        cleaned_properties = clean_property_data(properties)
        
        # Add the URL for reference
        cleaned_properties["source_url"] = property_url
        
        all_property_data.append(cleaned_properties)
        
        print(f"Successfully extracted data from: {property_url}")
        
    except Exception as e:
        print(f"Error extracting data from {property_url}: {str(e)}")
        continue

print(f"Successfully extracted data for {len(all_property_data)} properties")
```

Save the property data in JSON format to preserve the nested structure:

```python Python theme={null}
# Store the data as JSON
with open("properties.json", "w", encoding="utf-8") as f:
    json.dump(all_property_data, f, ensure_ascii=False, indent=2)

print(f"Data saved to properties.json")
```

For flat data structures, you can also export to CSV:

```python Python theme={null}
import pandas as pd

# Convert to DataFrame and save as CSV (flattens nested data)
df = pd.json_normalize(all_property_data)
df.to_csv("properties.csv", index=False)

print("Data saved to properties.csv")
```

## Step 4: Complete Implementation

Here's the complete code that extracts property data from multiple URLs:

```python Python expandable theme={null}
# pip install requests pandas
import requests
import json
import pandas as pd

def scraper(url, css_extractor=None):
    apikey = "YOUR_ZENROWS_API_KEY"
    
    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
        "wait": "2000",
        "css_extractor": css_extractor,
    }
    
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.json()

def clean_property_data(properties):
    """Clean and structure property data"""
    
    # Clean the "listed_by" field
    listed_by = properties.get("listed_by")
    if listed_by:
        properties["listed_by"] = listed_by.replace("Listed by:", "").strip()
    
    # Process and merge price history data
    price_history_dates = properties.get("price_change_dates")
    price_changes = properties.get("price_changes")

    if price_history_dates and price_changes:
        combined_price_history = []
        
        for i in range(0, len(price_history_dates), 2):
            date = price_history_dates[i]
            event = (
                price_history_dates[i + 1] if i + 1 < len(price_history_dates) else ""
            )
            price = price_changes[i // 2] if i // 2 < len(price_changes) else ""
            
            combined_price_history.append({
                "Date": date,
                "Event": event,
                "Price": price
            })
        
        properties["price_history"] = combined_price_history

        # Remove the original fields after combining
        for key in ["price_change_dates", "price_changes"]:
            if key in properties:
                del properties[key]
    
    return properties

# CSS extractor for property data
property_css_extractor = json.dumps(
    {
        "price": "span[data-testid='price']",
        "location": "ul.footer-breadcrumbs a, ul.footer-breadcrumbs strong",
        "dimension": "div[data-testid='bed-bath-sqft-facts']",
        "description": "div[data-testid='description']",
        "listed_by": "div[data-testid='seller-attribution']",
        "price_change_dates": "span[data-testid='date-info']",
        "price_changes": "td[data-testid='price-money-cell']",
    }
)

# Example property URLs
property_urls = [
    "https://www.zillow.com/homedetails/130-3rd-St-California-PA-15419/49745481_zpid/",
    "https://www.zillow.com/homedetails/882-Highpoint-Dr-Coal-Center-PA-15423/2087529083_zpid/",
    "https://www.zillow.com/homedetails/508-5th-St-California-PA-15419/49745256_zpid/",
    "https://www.zillow.com/homedetails/721-Spring-St-Roscoe-PA-15477/49794732_zpid/"
]

all_property_data = []

# Extract data from each property
for property_url in property_urls:
    try:
        properties = scraper(
            property_url,
            css_extractor=property_css_extractor,
        )
        
        # Clean and structure the data
        cleaned_properties = clean_property_data(properties)
        
        # Add the URL for reference
        cleaned_properties["source_url"] = property_url
        
        all_property_data.append(cleaned_properties)
        
        print(f"✓ Successfully extracted data from: {property_url}")
        
    except Exception as e:
        print(f"✗ Error extracting data from {property_url}: {str(e)}")
        continue

# Save the data
with open("properties.json", "w", encoding="utf-8") as f:
    json.dump(all_property_data, f, ensure_ascii=False, indent=2)

# Also save as CSV for easy analysis
df = pd.json_normalize(all_property_data)
df.to_csv("properties.csv", index=False)

print(f"\n🎉 Successfully extracted data for {len(all_property_data)} properties")
print("Data saved to properties.json and properties.csv")
```

Congratulations! 🎉 You've successfully extracted detailed property data from Zillow using ZenRows' web scraping capabilities.

## Data Management Best Practices

### Structure for your use case

Design your data structure to match your specific business needs rather than using a generic approach. For example, if you need to analyze price trends, structure price history as date/price pairs, not raw HTML. Map scraped fields directly to your business entities (property, agent, transaction) and use consistent, clear field names that work for your team.

### Validate before storage

Check for missing or malformed fields, unexpected data types, duplicate entries, and values outside the expected range. Use validation scripts or schema checks (such as Python's pydantic or JSON Schema). Automate this process to maintain consistency throughout your data extraction workflow.

### Preserve raw data

Store both cleaned and raw data separately. Raw data serves as a backup for debugging and reprocessing when requirements change.

### Choose the right storage format

Select the storage format that best fits your data and use case:

* **JSON**: Best for nested, hierarchical, or complex data structures. Human-readable and widely supported across platforms.
* **CSV**: Ideal for flat, tabular data. Easy to use in spreadsheets and analytics tools.
* **Databases (MongoDB, PostgreSQL)**: Suitable for large datasets that require frequent updates and querying.
* **Vector databases**: Designed for storing vectorized data, such as embeddings for Large Language Model consumption.

## Troubleshooting

### Missing data

**Solution 1**: Use adequate delay strategies to allow dynamic content to load. Increase the wait parameter value or add specific waits for elements that load asynchronously.

**Solution 2**: Verify you're using the correct CSS selectors. Test each selector using the ZenRows Request Playground before integrating it into your code. Create a monitoring strategy to spot site structural changes and isolate selectors from your codebase for easy debugging.

### CAPTCHA and anti-bot challenges

**Solution 1**: Ensure you're using anti-bot bypass parameters like js\_render and premium\_proxy.

**Solution 2**: If you continue encountering CAPTCHAs, integrate a CAPTCHA-solving service like 2Captcha through our solver integration options. Check our 2Captcha integration guide for implementation details.

**Solution 3**: Use fallback mechanisms or alternative pathways to prevent scraping failures.

### Rate limiting and geo-blocking

**Solution 1**: Use the premium\_proxy parameter to automatically rotate IP addresses.

**Solution 2**: Implement request retry mechanisms with exponential backoff delays between failed requests.

**Solution 3**: Add delays between requests to avoid overwhelming the target website.
