Summary of changes
- Open Deep Research: An open source version of OpenAI Deep Research. See here
- R1 Web Extractor Feature: New extraction capability added.
- O3-Mini Web Crawler: Introduces a lightweight crawler for specific use cases.
- Updated Model Parameters: Enhancements to o3-mini_company_researcher.
- URL Deduplication: Fixes handling of URLs ending with /, index.html, index.php, etc.
- Improved URL Blocking: Uses tldts parsing for better blocklist management.
- Valid JSON via rawHtml in Scrape: Ensures valid JSON extraction.
- Product Reviews Summarizer: Implements summarization using o3-mini.
- Scrape Options for Extract: Adds more configuration options for extracting data.
- O3-Mini Job Resource Extractor: Extracts job-related resources using o3-mini.
- Cached Scrapes for Extract evals: Improves performance by using cached data for extractions evals.
What's Changed
- You forgot an 'e' by @sami0596 in #1118
- added cached scrapes to extract by @rafaelsideguide in #1107
- Added R1 web extractor feature by @aparupganguly in #1115
- Feature o3-mini web crawler by @aparupganguly in #1120
- Updated Model Parameters (o3-mini_company_researcher) by @aparupganguly in #1130
- Fix corepack and self hosting setup by @rothnic in #1131
- fix(crawl-redis/generateURLPermutations): dedupe index.html/index.php/slash/bare URL ends (FIR-827) by @mogery in #1134
- feat(blocklist): Improve URL blocking with tldts parsing by @ftonato in #1117
- fix(scrape): allow getting valid JSON via rawHtml (FIR-852) by @mogery in #1138
- Implemented prodcut reviews summarizer using o3 mini by @aparupganguly in #1139
- [Feat] Added scrapeOptions to extract by @rafaelsideguide in #1133
- Feature/o3 mini job resource extractor by @aparupganguly in #1144
New Contributors
- @sami0596 made their first contribution in #1118
- @aparupganguly made their first contribution in #1115
- @rothnic made their first contribution in #1131
Full Changelog: v1.4.2...v1.4.3