npm install
.
axios.get
’s target will be the API base, and a second parameter is an object with params
: apikey
for authentication and url
. URLs must be encoded, but Axios will handle that when using params.
With this simple change, we will handle all the hassles of scraping, such as proxies rotation, bypassing CAPTCHAs and anti-bot solutions, setting correct headers, and many more. However, there are still some challenges that we will address now. Continue reading.
extractContent
to return URL, title, and h1 content. Your custom extracting logic goes there.
Cheerio offers a “jQuery-like” syntax, and it is designed to work on the server. Its load
method receives a plain HTML and creates a querying function that will allow us to find elements. Then you can query with CSS Selectors and navigate, manipulate, or extract content as a browser would. The resulting selector exposes text
, which will give us the content in plain text, without tags. Check the docs for more advanced features.
exponentialDelay
will increment exponentially plus a random margin between attempts.