github mendableai/firecrawl v1.5.0
Self-Host Overhaul - v1.5.0

2 days ago

Self-Host Fixes

  • Reworked Guide: The SELF_HOST.md and docker-compose.yaml have been updated for clarity and compatibility
  • Kubernetes Improvements: Updated self-hosted Kubernetes deployment examples for compatibility and consistency (#1177)
  • Self-Host Fixes: Numerous fixes aimed at improving self-host performance and stability (#1207)
  • Proxy Support: Added proxy support tailored for self-hosted environments (#1212)
  • Playwright Integration: Added fixes and continuous integration for the Playwright microservice (#1210)
  • Search Endpoint Upgrade: Added SearXNG support for the /search endpoint (#1193)

Core Fixes & Enhancements

  • Crawl Status Fixes: Fixed various race conditions in the crawl status endpoint (#1184)
  • Timeout Enforcement: Added timeout for scrapeURL engines to prevent hanging requests (#1183)
  • Query Parameter Retention: Map function now preserves query parameters in results (#1191)
  • Screenshot Action Order: Ensured screenshots execute after specified actions (#1192)
  • PDF Scraping: Improved handling for PDFs behind anti-bot measures (#1198)
  • Map/scrapeURL Abort Control: Integrated AbortController to stop scraping when the request times out (#1205)
  • SDK Timeout Enforcement: Enforced request timeouts in the SDK (#1204)

New Features & Additions

  • Proxy & Stealth Options: Introduced a proxy option and stealthProxy flag (#1196)
  • Deep Research (Alpha): Launched an alpha implementation of deep research (#1202)
  • LLM Text Generator: Added a new endpoint for llms.txt generation (#1201)

Docker & Containerization

  • Production Ready Docker Image: A streamlined, production ready Docker image is now available to simplify self-hosted deployments.

For the complete details, check out the full changelog.

What's Changed

  • fix(crawl-status): consider concurrency limited jobs as prioritized (FIR-851) by @mogery in #1184
  • fix(scrapeURL/sb): enforce timeout (FIR-980) by @mogery in #1183
  • fix(map): do not remove query parameters from results (FIR-1015) by @mogery in #1191
  • fix(scrapeURL/fire-engine): perform format screenshot after specified actions (FIR-985) by @mogery in #1192
  • Update self-hosted Kubernetes deployments examples for compatibility and consistency by @tetuyoko in #1177
  • fix(v1/types): fix extract -> json rename (FIR-1072) by @mogery in #1195
  • feat(v1): proxy option / stealthProxy flag (FIR-1050) by @mogery in #1196
  • fix(v1/types): fix extract -> json rename, ROUND II (FIR-1072) by @mogery in #1199
  • (feat/deep-research) Alpha implementation of deep research by @nickscamara in #1202
  • Add llmstxt generator endpoint by @ericciarla in #1201
  • fix(concurrency-limit): move to renewing a lock on each active job instead of estimating time to complete (FIR-1075) by @mogery in #1197
  • SELFHOST FIXES (FIR-1105) by @mogery in #1207
  • feat(v1/map): stop mapping if timed out via AbortController (FIR-747) by @mogery in #1205
  • Playwright page error schema by @makeiteasierapps in #1172
  • feat(ci/self-host): add playwright microservice tests by @mogery in #1210
  • feat(scrapeURL): handle PDFs behind anti-bot (FIR-722) by @mogery in #1198
  • Use correct list typing for py 3.8 support by @niazarak in #931
  • feat(map): mock support (FIR-1109) by @mogery in #1213
  • Add searxng for search endpoint by @loorisr in #1193
  • feat(sdk): enforce timeout on client-side if set (FIR-864) by @mogery in #1204
  • feat(self-host): proxy support (FIR-1111) by @mogery in #1212
  • temp by @mogery in #1218
  • gemini extractor Implementation by @aparupganguly in #1206

New Contributors

Full Changelog: v1.4.4...v1.5.0

Don't miss a new firecrawl release

NewReleases is sending notifications on new releases.