github apify/crawlee v2.3.0

latest releases: v3.9.2, v3.9.1, v3.9.0...
2 years ago

What's Changed

  • feat: accept more social media patterns by @lhotanok in #1286
  • feat: add multiple click support to enqueueLinksByClickingElements by @audiBookning in #1295
  • feat: instance-scoped "global" configuration by @barjin in #1315
  • feat: stealth deprecation by @petrpatek in #1314
  • feat: RequestList accepts ProxyConfiguration for requestsFromUrls by @barjin in #1317
  • feat: allow passing a stream to KeyValueStore.setRecord by @gahabeen in #1325
  • feat: update playwright to v1.20.2
  • feat: update puppeteer to v13.5.2

    We noticed that with this version of puppeteer actor run could crash with We either navigate top level or have old version of the navigated frame error (puppeteer issue here). It should not happen while running the browser in headless mode. In case you need to run the browser in headful mode (headless: false), we recommend pinning puppeteer version to 10.4.0 in actor package.json file.

  • fix: improve guessing of chrome executable path on windows by @audiBookning in #1294
  • fix: use correct apify-client instance for snapshotting by @B4nan in #1308
  • fix: prune CPU snapshots locally by @B4nan in #1313
  • fix: improve browser launcher types by @barjin in #1318
  • fix: reset RequestQueue state after 5 minutes of inactivity by @B4nan in #1324

0 concurrency mitigation

This release should resolve the 0 concurrency bug by automatically resetting the internal RequestQueue state after 5 minutes of inactivity.

We now track last activity done on a RequestQueue instance:

  • added new request
  • started processing a request (added to inProgress cache)
  • marked request as handled
  • reclaimed request

If we don't detect one of those actions in last 5 minutes, and we have some requests in the inProgress cache, we try to reset the state. We can override this limit via APIFY_INTERNAL_TIMEOUT env var.

This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the inProgress cache.

New Contributors

Full Changelog: v2.2.2...v2.3.0

Don't miss a new crawlee release

NewReleases is sending notifications on new releases.