Self-Host Fixes
- Reworked Guide: The
SELF_HOST.md
anddocker-compose.yaml
have been updated for clarity and compatibility - Kubernetes Improvements: Updated self-hosted Kubernetes deployment examples for compatibility and consistency (#1177)
- Self-Host Fixes: Numerous fixes aimed at improving self-host performance and stability (#1207)
- Proxy Support: Added proxy support tailored for self-hosted environments (#1212)
- Playwright Integration: Added fixes and continuous integration for the Playwright microservice (#1210)
- Search Endpoint Upgrade: Added SearXNG support for the
/search
endpoint (#1193)
Core Fixes & Enhancements
- Crawl Status Fixes: Fixed various race conditions in the crawl status endpoint (#1184)
- Timeout Enforcement: Added timeout for scrapeURL engines to prevent hanging requests (#1183)
- Query Parameter Retention: Map function now preserves query parameters in results (#1191)
- Screenshot Action Order: Ensured screenshots execute after specified actions (#1192)
- PDF Scraping: Improved handling for PDFs behind anti-bot measures (#1198)
- Map/scrapeURL Abort Control: Integrated AbortController to stop scraping when the request times out (#1205)
- SDK Timeout Enforcement: Enforced request timeouts in the SDK (#1204)
New Features & Additions
- Proxy & Stealth Options: Introduced a proxy option and stealthProxy flag (#1196)
- Deep Research (Alpha): Launched an alpha implementation of deep research (#1202)
- LLM Text Generator: Added a new endpoint for llms.txt generation (#1201)
Docker & Containerization
- Production Ready Docker Image: A streamlined, production ready Docker image is now available to simplify self-hosted deployments.
For the complete details, check out the full changelog.
What's Changed
- fix(crawl-status): consider concurrency limited jobs as prioritized (FIR-851) by @mogery in #1184
- fix(scrapeURL/sb): enforce timeout (FIR-980) by @mogery in #1183
- fix(map): do not remove query parameters from results (FIR-1015) by @mogery in #1191
- fix(scrapeURL/fire-engine): perform format screenshot after specified actions (FIR-985) by @mogery in #1192
- Update self-hosted Kubernetes deployments examples for compatibility and consistency by @tetuyoko in #1177
- fix(v1/types): fix extract -> json rename (FIR-1072) by @mogery in #1195
- feat(v1): proxy option / stealthProxy flag (FIR-1050) by @mogery in #1196
- fix(v1/types): fix extract -> json rename, ROUND II (FIR-1072) by @mogery in #1199
- (feat/deep-research) Alpha implementation of deep research by @nickscamara in #1202
- Add llmstxt generator endpoint by @ericciarla in #1201
- fix(concurrency-limit): move to renewing a lock on each active job instead of estimating time to complete (FIR-1075) by @mogery in #1197
- SELFHOST FIXES (FIR-1105) by @mogery in #1207
- feat(v1/map): stop mapping if timed out via AbortController (FIR-747) by @mogery in #1205
- Playwright page error schema by @makeiteasierapps in #1172
- feat(ci/self-host): add playwright microservice tests by @mogery in #1210
- feat(scrapeURL): handle PDFs behind anti-bot (FIR-722) by @mogery in #1198
- Use correct list typing for py 3.8 support by @niazarak in #931
- feat(map): mock support (FIR-1109) by @mogery in #1213
- Add searxng for search endpoint by @loorisr in #1193
- feat(sdk): enforce timeout on client-side if set (FIR-864) by @mogery in #1204
- feat(self-host): proxy support (FIR-1111) by @mogery in #1212
- temp by @mogery in #1218
- gemini extractor Implementation by @aparupganguly in #1206
New Contributors
- @tetuyoko made their first contribution in #1177
- @makeiteasierapps made their first contribution in #1172
- @niazarak made their first contribution in #931
- @loorisr made their first contribution in #1193
Full Changelog: v1.4.4...v1.5.0