github unclecode/crawl4ai v0.6.3

latest releases: v0.7.4, v0.7.3, v0.7.2...
3 months ago

Release 0.6.3 (unreleased)

Features

  • extraction: add RegexExtractionStrategy for pattern-based extraction, including built-in patterns for emails, URLs, phones, dates, support for custom regexes, an LLM-assisted pattern generator, optimized HTML preprocessing via fit_html, and enhanced network response body capture (9b5ccac)
  • docker-api: introduce job-based polling endpoints—POST /crawl/job & GET /crawl/job/{task_id} for crawls, POST /llm/job & GET /llm/job/{task_id} for LLM tasks—backed by Redis task management with configurable TTL, moved schemas to schemas.py, and added demo_docker_polling.py example (94e9959)
  • browser: improve profile management and cleanup—add process cleanup for existing Chromium instances on Windows/Unix, fix profile creation by passing full browser config, ship detailed browser/CLI docs and initial profile-creation test, bump version to 0.6.3 (9499164)

Fixes

  • crawler: remove automatic page closure in take_screenshot and take_screenshot_naive, preventing premature teardown; callers now must explicitly close pages (BREAKING CHANGE) (a3e9ef9)

Documentation

  • format bash scripts in docs/apps/linkdin/README.md so examples copy & paste cleanly (87d4b0f)
  • update the same README with full litellm argument details for correct script usage (bd5a9ac)

Refactoring

  • logger: centralize color codes behind an Enum in async_logger, browser_profiler, content_filter_strategy and related modules for cleaner, type-safe formatting (cd2b490)

Experimental

  • start migration of logging stack to rich (WIP, work ongoing) (b2f3cb0)

Don't miss a new crawl4ai release

NewReleases is sending notifications on new releases.