Discover common automation patterns and real-world scenarios when using ZenRows’ Scraping Browser with Puppeteer and Playwright. These practical examples demonstrate how to leverage browser automation for various data extraction and interaction tasks.
The Scraping Browser excels at handling complex scenarios that traditional HTTP-based scraping cannot address. From capturing visual content to executing custom JavaScript, these use cases showcase the full potential of browser-based automation for your scraping projects.
Websites often change their structure or update CSS class names and HTML tags. This means the selectors you use for scraping (like .product, .products, or specific element tags) might stop working if the site layout changes. To keep your scraper reliable, regularly check and update your selectors as needed.
Extract complete page content and metadata by navigating to target websites. This fundamental pattern forms the foundation for most scraping workflows and demonstrates how to retrieve both visible content and underlying HTML structure.
Copy
Ask AI
const puppeteer = require('puppeteer-core');const connectionURL = 'wss://browser.zenrows.com?apikey=YOUR_ZENROWS_API_KEY';const scraper = async () => { const browser = await puppeteer.connect({ browserWSEndpoint: connectionURL }); const page = await browser.newPage(); try { console.log('Navigating to target page...'); await page.goto('https://www.scrapingcourse.com/ecommerce/', { waitUntil: 'domcontentloaded' }); // Extract page metadata const title = await page.title(); console.log('Page title:', title); // Get complete HTML content console.log('Extracting page content...'); const html = await page.content(); // Extract specific elements const productCount = await page.$$eval('.product', products => products.length); console.log(`Found ${productCount} products on the page`); // Extract text content from specific elements const headings = await page.$$eval('h1, h2, h3', elements => elements.map(el => el.textContent.trim()) ); console.log('Page headings:', headings); return { title, productCount, headings, htmlLength: html.length }; } finally { await browser.close(); }};scraper().then(result => console.log('Extraction complete:', result));
Capture visual representations of web pages for monitoring, documentation, or visual verification purposes. Screenshots prove invaluable for debugging scraping workflows and creating visual records of dynamic content.
Full page capture: Include content below the fold with fullPage: true
Element-specific screenshots: Target individual components or sections
Custom clipping: Focus on specific page areas using coordinate-based clipping
Format options: PNG (lossless) or JPEG (with quality control from 0-100)
Default viewport: Screenshots use the standard 1920x1080 viewport size
Screenshots are captured from the cloud browser and automatically transferred to your local environment. Large full-page screenshots may take additional time to process and download.
Execute custom JavaScript within the browser context to manipulate pages, extract computed values, or perform complex data transformations. This powerful capability enables sophisticated automation scenarios beyond standard element selection.
Data extraction and processing: Transform raw data within the browser context
Page statistics collection: Gather comprehensive page metrics and analytics
Dynamic content interaction: Trigger JavaScript events and handle dynamic updates
Custom styling injection: Modify page appearance for testing or visual enhancement
Scroll automation: Navigate through infinite scroll or lazy-loaded content
Complex calculations: Perform mathematical operations on extracted data
Custom JavaScript execution runs within the browser’s security context, providing access to all DOM APIs and browser features available to the target website.
Generate PDF documents from web pages for archival, reporting, or documentation purposes. This capability proves valuable for creating snapshots of dynamic content or generating reports from scraped data.
Multiple format support: Generate A4, Letter, Legal, and custom page sizes
Custom headers and footers: Add branding, page numbers, and metadata
Background preservation: Include CSS backgrounds and styling in PDFs
Margin control: Configure precise spacing and layout
Orientation options: Create portrait or landscape documents
Scale adjustment: Optimize content size for better readability
PDF generation works seamlessly with the cloud browser, automatically transferring generated files to your local environment while maintaining high quality and formatting.
ZenRows’ Scraping Browser transforms complex web automation challenges into straightforward solutions. These practical use cases demonstrate the platform’s versatility in handling everything from basic content extraction to sophisticated browser automation workflows.
Start with the basic navigation and content extraction patterns to establish your foundation, then progressively incorporate advanced features like form interactions and network monitoring as your requirements evolve. The modular nature of these examples allows you to combine techniques for sophisticated automation workflows.
Consider implementing error handling and retry logic around these patterns for production deployments. The Scraping Browser’s consistent cloud environment reduces many common failure points, but robust error handling ensures reliable operation at scale.
Can I combine multiple use cases in a single scraping session?
Absolutely! These use cases are designed to work together. For example, you can navigate to a page, take screenshots, extract data, and generate PDFs all within the same browser session. This approach is more efficient and maintains session state across operations.
Node.js
Copy
Ask AI
// Example combining multiple use casesawait page.goto('https://example.com');await page.screenshot({ path: 'before.png' });await page.fill('input[name="search"]', 'query');await page.click('button[type="submit"]');await page.screenshot({ path: 'after.png' });const data = await page.$$eval('.result', elements => /* extract data */);await page.pdf({ path: 'results.pdf' });
How do I handle dynamic content that loads after page navigation?
Use explicit waiting mechanisms to ensure content is fully loaded before interaction:
Node.js
Copy
Ask AI
// Wait for specific elementsawait page.waitForSelector('.dynamic-content');// Wait for network to be idleawait page.goto(url, { waitUntil: 'domcontentloaded' });// Wait for custom conditionsawait page.waitForFunction(() => document.querySelectorAll('.product').length > 0);
The Scraping Browser handles JavaScript rendering automatically, making these waiting strategies highly effective.
How do I optimize performance when scraping large amounts of data?
Several strategies can significantly improve performance:
Block unnecessary resources: Use request interception to block images, fonts, and other non-essential content
Reuse browser instances: Keep browsers open for multiple operations instead of creating new connections
Implement concurrent processing: Use multiple browser instances for parallel scraping
Optimize waiting strategies: Use specific selectors instead of generic timeouts
The network monitoring examples demonstrate resource blocking techniques that can improve scraping speed.
What's the difference between using the WebSocket URL directly versus the SDK?
Both approaches provide identical functionality, but the SDK offers several advantages:
Simplified configuration: No need to manually construct WebSocket URLs
Better error handling: Built-in error messages and debugging information
Future compatibility: Automatic updates to connection protocols
Additional utilities: Helper methods for common tasks
For production applications, the SDK is recommended for better maintainability and error handling, while direct WebSocket connections work well for simple scripts and testing.
How do I troubleshoot issues when these examples don't work as expected?
Follow this systematic troubleshooting approach:
Verify API key: Ensure your ZenRows API key is correct and active
Check element selectors: Use browser developer tools to verify CSS selectors
Add debugging output: Include console.log statements to track execution flow
Implement error handling: Wrap operations in try-catch blocks
Test with simpler examples: Start with basic navigation before adding complexity
The network monitoring examples are particularly valuable for debugging, as they reveal exactly what requests are being made and their responses.