github apify/crawlee v0.21.0

latest releases: v3.9.2, v3.9.1, v3.9.0...
3 years ago

This release comes with breaking changes that will affect most, if not all of your projects. See the migration guide for more information and examples.

First large change is a redesigned proxy configuration. Cheerio and Puppeteer crawlers now accept a proxyConfiguration parameter, which is an instance of ProxyConfiguration. This class now exclusively manages both Apify Proxy and custom proxies. Visit the new proxy management guide

We also removed Apify.utils.getRandomUserAgent() as it was no longer effective in avoiding bot detection and changed the default values for empty properties in Request instances.

  • BREAKING: Removed Apify.getApifyProxyUrl(). To get an Apify Proxy url, use proxyConfiguration.newUrl([sessionId]).
  • BREAKING: Removed useApifyProxy, apifyProxyGroups and apifyProxySession parameters from all applications in the SDK. Use proxyConfiguration in crawlers and proxyUrl in requestAsBrowser and Apify.launchPuppeteer.
  • BREAKING: Removed Apify.utils.getRandomUserAgent() as it was no longer effective in avoiding bot detection.
  • BREAKING: Request instances no longer initialize empty properties with null, which means that:
    • empty errorMessages are now represented by [], and
    • empty loadedUrl, payload and handledAt are undefined.
  • Add Apify.createProxyConfiguration() async function to create ProxyConfiguration instances. ProxyConfiguration itself is not exposed.
  • Add proxyConfiguration to CheerioCrawlerOptions and PuppeteerCrawlerOptions.
  • Add proxyInfo to CheerioHandlePageInputs and PuppeteerHandlePageInputs. You can use this object to retrieve information about the currently used proxy in Puppeteer and Cheerio crawlers.
  • Add click buttons and scroll up options to Apify.utils.puppeteer.infiniteScroll().
  • Fixed a bug where intercepted requests would never continue.
  • Fixed a bug where Apify.utils.requestAsBrowser() would get into redirect loops.
  • Fix Apify.utils.getMemoryInfo() crashing the process on AWS Lambda and on systems running in Docker without memory cgroups enabled.
  • Update Puppeteer to 3.3.0.

Don't miss a new crawlee release

NewReleases is sending notifications on new releases.