Another exciting week with so much improvements by our amazing community. We're thrilled to announce the latest release of GPT Researcher, now featuring evaluations using the SimpleQA dataset by OpenAI. Our rigorous testing has demonstrated an impressive 93% accuracy rate, surpassing all current leading projects in the market.
This achievement underscores the remarkable capabilities of the open-source community, and we're just getting started! In response to extensive feedback, we've refined our deep research functionalities to be faster, smarter, and more cost-effective, while also addressing previous bugs. Update to the latest version and experience the enhancements firsthand!
Here are results of our latest evals run:
Evaluation Summary
Debug counts:
Total successful: 100
CORRECT: 93
INCORRECT: 7
NOT_ATTEMPTED: 1
{
"correct_rate": 0.93,
"incorrect_rate": 0.07,
"not_attempted_rate": 0.01,
"answer_rate": 0.99,
"accuracy": 0.9292929292929293,
"f1": 0.9246231155778895
}
What's Changed
- Fix
Key Error
while using Deep Research by @kongacute in #1188 - Update requirements.txt with missing langgraph dep by @namin in #1189
- Fix Docker Build Failure: Updated
combined_query
inDeepRsearchSkill.run()
to Handle Backslashes in F-Strings by @monolok in #1192 - stabilize docker & frontend upgrades by @ElishaKay in #1191
- Improved overall planning and research performance by @assafelovic in #1195
- Added support for base_url param in create_chat_completions for OpenAI Provider by @gaurav3247 in #1198
- Update llm.py by @olipayne in #1200
- Fix WebSocket timeout issues by @luislofer89 in #1203
- fix: Add missing langgraph module to requirements.txt by @hurxxxx in #1207
- Refactor: typing cleanup by @czakop in #1187
- add async nodriver scrapper by @ewgdg in #1170
- Add language requirement to resource report prompt by @hurxxxx in #1208
- Feature:eval metrics by @kga245 in #1183
- README for feat(evals): Add SimpleQA evaluation framework and initial results by @kga245 in #1212
- Polish up loose ends based on feedback by @ElishaKay in #1211
New Contributors
- @namin made their first contribution in #1189
- @olipayne made their first contribution in #1200
- @luislofer89 made their first contribution in #1203
- @hurxxxx made their first contribution in #1207
- @czakop made their first contribution in #1187
Full Changelog: v3.2.2...v3.2.3