1.0.0 (2025-09-29)
🚀 Features
- Add utility for load and parse Sitemap and
SitemapRequestLoader
(#1169) (66599f8) by @Mantisus - Add periodic status logging and
status_message_callback
parameter for customization (#1265) (b992fb2) by @Mantisus - Add crawlee-cli option to skip project installation (#1294) (4d5aef0) by @Pijukatel
- Improve
Crawlee
CLI help text (#1297) (afbe10f) by @Pijukatel - Add basic
OpenTelemetry
instrumentation (#1255) (a92d8b3) by @Pijukatel - Add
ImpitHttpClient
http-client client using theimpit
library (#1151) (0d0d268) by @Mantisus - Prevent overloading system memory when running locally (#1270) (30de3bd) by @janbuchar
- Expose
PlaywrightPersistentBrowser
class (#1314) (b5fa955) by @Mantisus - Add
impit
option for Crawlee CLI (#1312) (508d7ce) by @Mantisus - Persist RequestList state (#1274) (cc68014) by @janbuchar
- Persist
DefaultRenderingTypePredictor
state (#1340) (fad4c25) by @Mantisus - Persist the
SitemapRequestLoader
state (#1347) (27ef9ad) by @Mantisus - Add support for NDU storages (#1401) (5dbd212) by @vdusek
- Add RQ id, name, alias args to
add_requests
andenqueue_links
methods (#1413) (1cae2bc) by @Mantisus - Add
SqlStorageClient
based onsqlalchemy
v2+ (#1339) (07c75a0) by @Mantisus
🐛 Bug Fixes
- Fix memory estimation not working on MacOS (#1330) (ab020eb) by @Pijukatel
- Fix retry count to not count the original request (#1328) (74fa1d9) by @Pijukatel
- [breaking] Remove unused "stats" field from RequestQueueMetadata (#1331) (0a63bef) by @vdusek
- Ignore unknown parameters passed in cookies (#1336) (50d3ef7) by @Mantisus
- Fix
timeout
forstream
method inImpitHttpClient
(#1352) (54b693b) by @Mantisus - Include reason in the session rotation warning logs (#1363) (d6d7a45) by @vdusek
- Improve crawler statistics logging (#1364) (1eb6da5) by @vdusek
- Do not add a request that is already in progress to
MemoryRequestQueueClient
(#1384) (3af326c) by @Mantisus - Save
RequestQueueState
forFileSystemRequestQueueClient
in default KVS (#1411) (6ee60a0) by @Mantisus - Set default desired concurrency for non-browser crawlers to 10 (#1419) (1cc9401) by @vdusek
Refactor
- [breaking] Introduce new storage client system (#1194) (de1c03f) by @vdusek
- [breaking] Split
BrowserType
literal into two different literals based on context (#1070) (72b5698) by @Pijukatel - [breaking] Change method
HttpResponse.read
from sync to async (#1296) (83fa8a4) by @Mantisus - [breaking] Replace
HttpxHttpClient
withImpitHttpClient
as default HTTP client (#1307) (c803a97) by @Mantisus - [breaking] Change Dataset unwind parameter to accept list of strings (#1357) (862a203) by @vdusek
- [breaking] Remove
Request.id
field (#1366) (32f3580) by @Pijukatel - [breaking] Refactor storage creation and caching, configuration and services (#1386) (04649bd) by @Pijukatel