Crawl4AI v0.5.0.post1 Release
Release Theme: Power, Flexibility, and Scalability
Crawl4AI v0.5.0 is a major release focused on significantly enhancing the library's power, flexibility, and scalability.
Key Features
- Deep Crawling System - Explore websites beyond initial URLs with BFS, DFS, and BestFirst strategies, with page limiting and scoring capabilities
- Memory-Adaptive Dispatcher - Scale to thousands of URLs with intelligent memory monitoring and concurrency control
- Multiple Crawling Strategies - Choose between browser-based (Playwright) or lightweight HTTP-only crawling
- Docker Deployment - Easy deployment with FastAPI server, JWT authentication, and streaming/non-streaming endpoints
- Command-Line Interface - New
crwl
CLI provides convenient access to all features with intuitive commands - Browser Profiler - Create and manage persistent browser profiles to save authentication states for protected content
- Crawl4AI Coding Assistant - Interactive chat interface for asking questions about Crawl4AI and generating Python code examples
- LXML Scraping Mode - Fast HTML parsing using the
lxml
library for 10-20x speedup with complex pages - Proxy Rotation - Built-in support for dynamic proxy switching with authentication and session persistence
- PDF Processing - Extract and process data from PDF files (both local and remote)
Additional Improvements
- LLM Content Filter for intelligent markdown generation
- URL redirection tracking
- LLM-powered schema generation utility for extraction templates
- robots.txt compliance support
- Enhanced browser context management
- Improved serialization and config handling
Breaking Changes
This release contains several breaking changes. Please review the full release notes for migration guidance.
For complete details, visit: https://docs.crawl4ai.com/blog/releases/0.5.0/