assafelovic/gpt-researcher v3.2.3 on GitHub

Another exciting week with so much improvements by our amazing community. We're thrilled to announce the latest release of GPT Researcher, now featuring evaluations using the SimpleQA dataset by OpenAI. Our rigorous testing has demonstrated an impressive 93% accuracy rate, surpassing all current leading projects in the market.

This achievement underscores the remarkable capabilities of the open-source community, and we're just getting started! In response to extensive feedback, we've refined our deep research functionalities to be faster, smarter, and more cost-effective, while also addressing previous bugs. Update to the latest version and experience the enhancements firsthand!

Here are results of our latest evals run:

Evaluation Summary

Debug counts:
Total successful: 100
CORRECT: 93
INCORRECT: 7
NOT_ATTEMPTED: 1
{
"correct_rate": 0.93,
"incorrect_rate": 0.07,
"not_attempted_rate": 0.01,
"answer_rate": 0.99,
"accuracy": 0.9292929292929293,
"f1": 0.9246231155778895
}

What's Changed

Fix Key Error while using Deep Research by @kongacute in #1188
Update requirements.txt with missing langgraph dep by @namin in #1189
Fix Docker Build Failure: Updated combined_query in DeepRsearchSkill.run() to Handle Backslashes in F-Strings by @monolok in #1192
stabilize docker & frontend upgrades by @ElishaKay in #1191
Improved overall planning and research performance by @assafelovic in #1195
Added support for base_url param in create_chat_completions for OpenAI Provider by @gaurav3247 in #1198
Update llm.py by @olipayne in #1200
Fix WebSocket timeout issues by @luislofer89 in #1203
fix: Add missing langgraph module to requirements.txt by @hurxxxx in #1207
Refactor: typing cleanup by @czakop in #1187
add async nodriver scrapper by @ewgdg in #1170
Add language requirement to resource report prompt by @hurxxxx in #1208
Feature:eval metrics by @kga245 in #1183
README for feat(evals): Add SimpleQA evaluation framework and initial results by @kga245 in #1212
Polish up loose ends based on feedback by @ElishaKay in #1211

New Contributors

@namin made their first contribution in #1189
@olipayne made their first contribution in #1200
@luislofer89 made their first contribution in #1203
@hurxxxx made their first contribution in #1207
@czakop made their first contribution in #1187

Full Changelog: v3.2.2...v3.2.3

assafelovic/gpt-researcher v3.2.3 SimpleQA Evals and Deep Research 2.0 on GitHub

Evaluation Summary

Debug counts: Total successful: 100 CORRECT: 93 INCORRECT: 7 NOT_ATTEMPTED: 1 { "correct_rate": 0.93, "incorrect_rate": 0.07, "not_attempted_rate": 0.01, "answer_rate": 0.99, "accuracy": 0.9292929292929293, "f1": 0.9246231155778895 }

What's Changed

New Contributors

assafelovic/gpt-researcher v3.2.3
SimpleQA Evals and Deep Research 2.0

on GitHub

Debug counts:
Total successful: 100
CORRECT: 93
INCORRECT: 7
NOT_ATTEMPTED: 1
{
"correct_rate": 0.93,
"incorrect_rate": 0.07,
"not_attempted_rate": 0.01,
"answer_rate": 0.99,
"accuracy": 0.9292929292929293,
"f1": 0.9246231155778895
}