apify/crawlee-python v0.1.0 on GitHub

Features

Why is Crawlee the preferred choice for web scraping and crawling?

Unified interface for HTTP & headless browser crawling.
Automatic parallel crawling based on available system resources.
Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking).
Automatic retries on errors or when you’re getting blocked.
Integrated proxy rotation and session management.
Configurable request routing - direct URLs to the appropriate handlers.
Persistent queue for URLs to crawl.
Pluggable storage of both tabular data and files.
Robust error handling.

Crawlee has out-of-the-box support for headless browser crawling (Playwright).
Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code.
Complete type hint coverage.
Based on standard Asyncio.