Skip to main content
Replace your current web scraping solution with ZenRows to improve reliability, bypass anti-bot protection, and reduce maintenance overhead. This guide shows you how to integrate ZenRows into existing price monitoring systems.

Who Is this for?

This guide is for developers who already have a price monitoring system and want to integrate ZenRows to improve scraping reliability and reduce blocking issues.

What you’ll learn

  • Replace existing HTTP clients with ZenRows requests
  • Migrate from HTML parsing to CSS extraction
  • Optimize performance with concurrency controls
  • Monitor and control scraping costs
  • Scale across multiple regions

Prerequisites

  • An existing price monitoring system
  • A ZenRows API key (SIGN UP HERE)
  • Basic understanding of web scraping concepts

Integration Approaches

Choose the integration approach that best fits your current system:

Approach 1: Minimal Integration (HTTP Client Replacement)

Replace your current HTTP client with ZenRows while keeping existing parsing logic. Before (typical implementation):
Python
import requests
from bs4 import BeautifulSoup

def get_page_content(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9",
    }
    response = requests.get(url, headers=headers)
    
    if response.status_code != 200:
        raise Exception(f"Request failed: {response.status_code}")
    
    return response.content

def scrape_price_data(url):
    html_content = get_page_content(url)
    soup = BeautifulSoup(html_content, "html.parser")
    
    # Extract data using BeautifulSoup
    name_elem = soup.select_one("#productTitle")
    price_elem = soup.select_one("span.aok-offscreen")
    
    return {
        "name": name_elem.get_text(strip=True) if name_elem else None,
        "price": price_elem.get_text(strip=True) if price_elem else None,
    }
After (ZenRows integration):
Python
import requests
from bs4 import BeautifulSoup

def get_page_content(url):
    params = {
        "url": url,
        "apikey": "YOUR_ZENROWS_API_KEY",
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
        "wait": 2000,
    }
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    
    if response.status_code != 200:
        raise Exception(f"Request failed: {response.status_code}")
    
    return response.text

# Keep existing parsing logic unchanged
def scrape_price_data(url):
    html_content = get_page_content(url)
    soup = BeautifulSoup(html_content, "html.parser")
    
    # Same parsing logic as before
    name_elem = soup.select_one("#productTitle")
    price_elem = soup.select_one("span.aok-offscreen")
    
    return {
        "name": name_elem.get_text(strip=True) if name_elem else None,
        "price": price_elem.get_text(strip=True) if price_elem else None,
    }
Benefits:
  • Minimal code changes required
  • Immediate anti-bot protection
  • Keep existing data processing logic
  • Easy rollback if needed

Approach 2: Full Integration (CSS Extraction)

Replace both HTTP client and HTML parsing with ZenRows CSS extraction for cleaner, more maintainable code. Before:
Python
import requests
from bs4 import BeautifulSoup

def scrape_price_data(url):
    # HTTP request + HTML parsing
    headers = {"User-Agent": "Mozilla/5.0..."}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")
    
    # Manual element extraction
    name_elem = soup.select_one("#productTitle")
    price_elem = soup.select_one("span.aok-offscreen")
    rating_elem = soup.select_one("span.a-size-base.a-color-base")
    
    return {
        "name": name_elem.get_text(strip=True) if name_elem else None,
        "price": price_elem.get_text(strip=True) if price_elem else None,
        "rating": rating_elem.get_text(strip=True) if rating_elem else None,
    }
After:
Python
import requests
import json

def scrape_price_data(url):
    # Define CSS selectors
    css_extractor = json.dumps({
        "name": "#productTitle",
        "price": "span.aok-offscreen",
        "rating": "span.a-size-base.a-color-base",
    })
    
    # Single ZenRows request with extraction
    params = {
        "url": url,
        "apikey": "YOUR_ZENROWS_API_KEY",
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": "us",
        "css_extractor": css_extractor,
    }
    
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    
    if response.status_code != 200:
        raise Exception(f"Request failed: {response.status_code}")
    
    return response.json()
Benefits:
  • Eliminates HTML parsing dependencies
  • Cleaner, more maintainable code
  • Built-in data extraction
  • Reduced code complexity

Step-by-Step Integration

1

Assess Your Current Implementation

Before integrating ZenRows, analyze your current scraping setup to choose the best integration approach. Identify your current components:
  • HTTP client (requests, urllib, etc.)
  • HTML parser (BeautifulSoup, lxml, etc.)
  • Data extraction logic
  • Error handling mechanisms
  • Proxy management (if any)
Common patterns to look for:
Python
# Pattern 1: Simple requests + BeautifulSoup
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")

# Pattern 2: Session-based scraping
session = requests.Session()
session.headers.update(headers)
response = session.get(url)

# Pattern 3: Selenium-based scraping
driver = webdriver.Chrome()
driver.get(url)
element = driver.find_element(By.CSS_SELECTOR, selector)

# Pattern 4: Custom proxy rotation
proxies = {"http": proxy_url, "https": proxy_url}
response = requests.get(url, proxies=proxies)
Integration complexity assessment:
  • Low complexity: Simple requests + BeautifulSoup → Use Approach 1
  • Medium complexity: Custom headers/sessions → Use Approach 1 or 2
  • High complexity: Selenium/complex proxy logic → Use Approach 2
2

Replace HTTP Client

Start with the minimal integration approach by replacing your HTTP client with ZenRows.For requests-based systems:
Python
# Original function
def fetch_page(url, headers=None, proxies=None):
    response = requests.get(url, headers=headers, proxies=proxies)
    return response.text

# ZenRows replacement
def fetch_page(url, apikey, country="us"):
    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "proxy_country": country,
    }
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.text
For Puppeteer/Playwright-based systems:
JavaScript
// Original Puppeteer approach
const puppeteer = require('puppeteer');

async function scrapeWithPuppeteer(url) {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle2' });
    
    const html = await page.content();
    await browser.close();
    return html;
}

// ZenRows Scraping Browser replacement
const puppeteer = require('puppeteer-core');

async function scrapeWithZenRows(url) {
    const browser = await puppeteer.connect({
        browserWSEndpoint: 'wss://browser.zenrows.com?apikey=YOUR_ZENROWS_API_KEY'
    });
    
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle2' });
    
    const html = await page.content();
    await browser.close();
    return html;
}
Python
# Original Playwright approach
from playwright.sync_api import sync_playwright

def scrape_with_playwright(url):
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.goto(url)
        html = page.content()
        browser.close()
        return html

# ZenRows Scraping Browser replacement
from playwright.sync_api import sync_playwright

def scrape_with_zenrows_browser(url):
    with sync_playwright() as p:
        browser = p.chromium.connect_over_cdp(
            "wss://browser.zenrows.com?apikey=YOUR_ZENROWS_API_KEY"
        )
        page = browser.new_page()
        page.goto(url)
        html = page.content()
        browser.close()
        return html
3

Migrate to CSS Extraction (Optional)

For cleaner code and better maintainability, replace HTML parsing with ZenRows CSS extraction.Identify your current selectors:
Python
# Current BeautifulSoup selectors
soup = BeautifulSoup(html, "html.parser")
name = soup.select_one("#productTitle").get_text(strip=True)
price = soup.select_one("span.aok-offscreen").get_text(strip=True)
rating = soup.select_one("span.a-size-base").get_text(strip=True)
reviews = soup.select_one("#acrCustomerReviewText").get_text(strip=True)
Convert to CSS extractor:
Python
import json

# Define CSS extractor
css_extractor = json.dumps({
    "name": "#productTitle",
    "price": "span.aok-offscreen", 
    "rating": "span.a-size-base",
    "reviews": "#acrCustomerReviewText",
})

def scrape_with_css_extractor(url, apikey):
    params = {
        "url": url,
        "apikey": apikey,
        "js_render": "true",
        "premium_proxy": "true",
        "css_extractor": css_extractor,
    }
    response = requests.get("https://api.zenrows.com/v1/", params=params)
    return response.json()  # Returns structured data directly

Migration Checklist

Pre-Migration
  • Document current scraping logic and selectors
  • Identify proxy and header requirements
  • Test ZenRows with sample requests
  • Plan rollback strategy
During Migration
  • Replace HTTP client with ZenRows API calls
  • Update error handling for ZenRows responses
  • Test with production URLs
  • Monitor request costs and concurrency
Post-Migration
  • Remove old proxy management code
  • Clean up unused HTML parsing dependencies
  • Update monitoring and alerting
  • Document new ZenRows integration

Best Practices

Cost Management
  • Set daily cost limits to prevent unexpected charges
  • Monitor request costs with the X-Request-Cost header
  • Use Concurrency-Limit and Concurrency-Remaining response headers to optimize throughput and avoid IP block errors
  • Cache results when appropriate to reduce requests
Reliability
  • Implement retry logic with exponential backoff
  • Monitor selector stability and update as needed
  • Use fallback selectors for critical data points
  • Log all requests responses and errors for debugging and monitoring
Performance
  • Use appropriate concurrency based on your plan limits
  • Batch requests when monitoring multiple products
  • Leverage geographic targeting for region-specific data

Troubleshooting

Selector Compatibility
  • Test selectors in ZenRows Request Builder before migration
  • Some selectors may behave differently with JavaScript rendering
  • Use more specific selectors if extraction returns unexpected data
Cost Optimization
  • Disable js_render for static content to reduce costs
  • Use wait_for instead of wait when possible
  • Monitor Concurrency-Limit and Concurrency-Remaining response headers to maximize throughput
Error Handling
  • ZenRows returns different error codes. See the API Error Codes documentation for more details.
  • Implement specific handling for rate limits (429) and quota exceeded (402)
  • Log ZenRows-specific headers for debugging