This guide walks you through advanced CSS selector strategies to help you extract structured data from a wide variety of web layouts using the ZenRows API.

Basic API Call Structure

All examples use the following basic API call pattern with the ZenRows API:

const axios = require('axios');

const response = await axios.get('https://api.zenrows.com/v1/', {
  params: {
    apikey: 'YOUR_ZENROWS_API_KEY',
    url: 'https://example.com',
    js_render: true, // Optional, needed for JavaScript-rendered content
    css_extractor: JSON.stringify({
      // Your selectors here
    })
  }
});

Essential CSS Selector Techniques

Basic Selectors

Target elements by their tag names, classes, or IDs:

css_extractor: JSON.stringify({
  headings: "h1", // Select all <h1> elements
  products: ".product", // Select elements with the 'product' class
  mainContent: "#main-content", // Select the element with the 'main-content' ID
  productTitles: ".product h2.title", // Select <h2> elements with the 'title' class inside 'product' class
  topLevelNav: "nav > a" // Select direct child <a> elements of <nav>
})

Attribute Selectors

Extract content based on HTML attributes:

css_extractor: JSON.stringify({
  imageUrls: "img @src", // Extract 'src' attribute from <img> tags
  linkUrls: "a @href", // Extract 'href' attribute from <a> tags
  premiumItems: "[data-premium='true']", // Select elements with a specific attribute value
  externalLinks: "[href^='https://'] @href", // Extract 'href' starting with 'https://'
  pdfDownloads: "[href$='.pdf'] @href" // Extract 'href' ending with '.pdf'
})

Positional Selectors

Target elements based on their position in the document:

css_extractor: JSON.stringify({
  firstProduct: ".product:first-child", // Selects the first element with the class 'product'
  lastProduct: ".product:last-child", // Selects the last element with the class 'product'

  thirdProduct: ".product:nth-child(3)", // Selects the third element with the class 'product'

  evenProducts: ".product:nth-child(even)", // Selects all even-numbered 'product' elements
  oddProducts: ".product:nth-child(odd)", // Selects all odd-numbered 'product' elements

  firstHeading: "h2:nth-of-type(1)", // Selects the first <h2> element of its type

  tableHeaders: "table th", // Selects all <th> elements inside any <table>
  secondColumnCells: "tr td:nth-child(2)" // Selects the second <td> element in each table row
})

Combining Multiple Selectors

Combine selectors for more specific targeting:

css_extractor: JSON.stringify({
  headingsAndLinks: "h1, h2, a", // Multiple selectors with commas
  cardTitles: ".card .title", // Descendant combinator (space)
  directListItems: "ul > li", // Child combinator (>)
  labelValues: "label + input @value", // Adjacent sibling combinator (+)
  relatedItems: ".main-item ~ .related-item" // General sibling combinator (~)
})

Practical Extraction Scenarios

E-commerce Product Extraction

Extract product details from a listing page:

css_extractor: JSON.stringify({
  productNames: ".product-item .product-title", // Selects the product title within each product item
  productPrices: ".product-item .price", // Selects the price element within each product item
  productRatings: ".product-item .rating @data-score", // Extracts the 'data-score' attribute from the rating element
  productImages: ".product-item img.product-image @src", // Extracts the 'src' attribute from the product image
  productUrls: ".product-item a.product-link @href", // Extracts the 'href' attribute from the product link
  productAvailability: ".product-item .availability-badge", // Selects the availability badge within each product item
  productDiscounts: ".product-item .discount-tag" // Selects the discount tag element within each product item
})

Product Specification Tables

Extracting structured data from specification tables:

css_extractor: JSON.stringify({
  specLabels: ".specs-table tr td:first-child", // Table headers
  specValues: ".specs-table tr td:last-child", // Table values
  processor: ".tech-specs .processor", // Processor details
  memory: ".tech-specs .memory", // Memory details
  storage: ".tech-specs .storage", // Storage details
  graphics: ".tech-specs .graphics" // Graphics details
})

Real Estate Listings

Extracting property information:

css_extractor: JSON.stringify({
  propertyAddresses: ".property-listing .address", // Property addresses
  propertyPrices: ".property-listing .price", // Prices
  propertyBedrooms: ".property-listing .bedrooms", // Number of bedrooms
  propertyBathrooms: ".property-listing .bathrooms", // Number of bathrooms
  propertyArea: ".property-listing .square-footage", // Square footage
  propertyTypes: ".property-listing .property-type", // Property type
  propertyAgents: ".property-listing .agent-name", // Agent names
  propertyImages: ".property-listing .property-image @src" // Image URLs
})

News Articles and Blog Posts

Extracting content from articles:

css_extractor: JSON.stringify({
  articleTitle: "article h1", // Article title
  articleSubtitle: "article h2", // Article subtitle
  articleDate: "article .publication-date", // Publication date
  articleAuthor: "article .author-name", // Author name
  articleContent: "article .content p", // Article content
  articleCategories: "article .category-tag", // Categories
  articleImages: "article .article-image @src", // Image URLs
  relatedArticles: ".related-articles .article-link @href" // Related article links
})

Advanced Selection Techniques

Working with Pagination Elements

Identify and extract pagination information:

css_extractor: JSON.stringify({
  currentPage: ".pagination .current @data-page", // Current page number
  totalPages: ".pagination @data-total-pages", // Total number of pages
  nextPageUrl: ".pagination .next @href", // Next page URL
  prevPageUrl: ".pagination .prev @href", // Previous page URL
  pageNumbers: ".pagination .page-number", // All page numbers
  isLastPage: ".pagination .next @disabled" // Check if it's the last page
})

Extract multi-level navigation structures:

css_extractor: JSON.stringify({
  // Main navigation
  mainNavLinks: ".main-nav > li > a @href",
  mainNavText: ".main-nav > li > a",

  // Second level categories
  subNavLinks: ".main-nav > li > .dropdown > a @href",
  subNavText: ".main-nav > li > .dropdown > a",

  // Third level
  deepNavLinks: ".main-nav > li > .dropdown > .sub-dropdown > a @href"
})

Social Media Content

Extract content from social media-style layouts:

css_extractor: JSON.stringify({
  postAuthors: ".post .author-name",
  postTimestamps: ".post .timestamp",
  postContent: ".post .content-text",
  postImages: ".post .post-image @src",
  postLikes: ".post .like-count",
  postComments: ".post .comment-count",
  postShares: ".post .share-count",

  // Comments
  commentAuthors: ".comments .comment .author",
  commentContent: ".comments .comment .text",
  commentTimestamps: ".comments .comment .time"
})

Data Table Extraction

Extract structured data from tables:

css_extractor: JSON.stringify({
  // Table headers
  tableHeaders: "table thead th",

  // First column (often labels)
  rowLabels: "table tbody tr td:first-child",

  // Specific cells using nth-child
  secondColValues: "table tbody tr td:nth-child(2)",
  thirdColValues: "table tbody tr td:nth-child(3)",

  // Cell with specific data attributes
  highlightedCells: "table td[data-highlight='true']"
})

Special Extraction Scenarios

Form Fields

Extract form field values and attributes:

css_extractor: JSON.stringify({
  formLabels: "form label", // Form labels
  inputValues: "form input @value", // Input field values
  inputPlaceholders: "form input @placeholder", // Input placeholders
  selectedOptions: "form select option[selected]", // Selected options in dropdowns
  checkboxStatus: "form input[type='checkbox'] @checked", // Checkbox status
  radioStatus: "form input[type='radio'] @checked", // Radio button status
  formActionUrl: "form @action", // Form action URL
  formMethod: "form @method" // Form method (GET/POST)
})

Metadata Extraction

Extract metadata from the HTML <head>:

css_extractor: JSON.stringify({
  pageTitle: "title", // Page title
  metaDescription: "meta[name='description'] @content", // Meta description
  canonicalUrl: "link[rel='canonical'] @href", // Canonical URL
  ogTitle: "meta[property='og:title'] @content", // Open Graph title
  ogImage: "meta[property='og:image'] @content", // Open Graph image
  ogDescription: "meta[property='og:description'] @content", // Open Graph description
  twitterCard: "meta[name='twitter:card'] @content" // Twitter card type
})

Troubleshooting Selectors

When your selectors aren’t working as expected, try these approaches:

  1. Make Selectors More Specific

    // Too general
    { title: ".title" }
    
    // More specific
    { title: "article .main-content .title" }
    
  2. Check for iframes Content might be inside iframes that require additional handling.

  3. Handle Special Characters

    // For classes with special characters
    { price: ".price-\\$" }
    
  4. Use Developer Tools to Verify Always test your selectors using the browser developer tools first.

Testing Workflow

We recommend this workflow for developing and testing selectors:

  1. Test the selector in the browser using DevTools

  2. Extract Full HTML First

    const htmlResponse = await axios.get('https://api.zenrows.com/v1/', {
      params: {
        apikey: 'YOUR_ZENROWS_API_KEY',
        url: 'https://example.com',
        js_render: true
      }
    });
    
  3. Test Selectors Locally with Cheerio

    const cheerio = require('cheerio');
    const $ = cheerio.load(htmlResponse.data);
    console.log($('h1.product-title').text()); // Test selector
    
  4. Refine and Apply CSS Selectors with ZenRows

    const extractedData = await axios.get('https://api.zenrows.com/v1/', {
      params: {
        apikey: 'YOUR_ZENROWS_API_KEY',
        url: 'https://example.com',
        js_render: true,
        css_extractor: JSON.stringify({
          // Your refined selectors
        })
      }
    });
    

By following these techniques, you can effectively extract data from even the most complex web layouts using ZenRows and CSS selectors.