🚀 0.6.0rc1 — 22 Apr 2025
Highlights
- World‑aware crawlers, set
geo_locale={"city":"Tokyo","lang":"ja","tz":"Asia/Tokyo"}
and scrape the right version every time. - Table‑to‑DataFrame extraction, flip
extract_tables=True
and get CSV or pandas without extra parsing. - Crawler pool with pre‑warm, pages launch hot, lower P90 latency, lower memory.
- Network and console capture, full traffic log plus MHTML snapshot for audits and debugging.
Added
- Geolocation, locale, and timezone flags for every crawl.
- Browser pooling with page pre‑warming.
- Table extractor that exports to CSV or pandas.
- Crawler pool manager in SDK and Docker API.
- Network & console log capture, plus MHTML snapshot.
- MCP socket and SSE endpoints with playground UI.
- Stress‑test framework (
tests/memory
) for 1 k+ URL runs. - Docs v2: TOC, GitHub badge, copy‑code buttons, Docker API demo.
- “Ask AI” helper button, work in progress, shipping soon.
- New examples: geo location, network/console capture, Docker API, markdown source selection, crypto analysis.
Changed
- Browser strategy consolidation, legacy docker modules removed.
ProxyConfig
moved toasync_configs
.- Server migrated to pool‑based crawler management.
- FastAPI validators replace custom query validation.
- Docker build now uses a Chromium base image.
- Repo cleanup, ≈36 k insertions, ≈5 k deletions across 121 files.
Fixed
- Session leaks, duplicate visits, URL normalisation.
- Target‑element regressions in scraping strategies.
- Logged URL readability, encoded URL decoding, middle truncation.
- Closed issues: #701 #733 #756 #774 #804 #822 #839 #841 #842 #843 #867 #902 #911.
Removed
- Obsolete modules in
crawl4ai/browser/*
.
Deprecated
- Old markdown generator names now alias
DefaultMarkdownGenerator
and warn.
Upgrade notes
- Update any imports from
crawl4ai/browser/*
to the new pooled browser modules. - If you override
AsyncPlaywrightCrawlerStrategy.get_page
adopt the new signature. - Rebuild Docker images to pick up the Chromium layer.
- Switch to
DefaultMarkdownGenerator
to silence deprecation warnings.
121 files changed, ≈36 223 insertions, ≈4 975 deletions