github assafelovic/gpt-researcher v3.2.3
SimpleQA Evals and Deep Research 2.0

2 days ago

Another exciting week with so much improvements by our amazing community. We're thrilled to announce the latest release of GPT Researcher, now featuring evaluations using the SimpleQA dataset by OpenAI. Our rigorous testing has demonstrated an impressive 93% accuracy rate, surpassing all current leading projects in the market.

This achievement underscores the remarkable capabilities of the open-source community, and we're just getting started! In response to extensive feedback, we've refined our deep research functionalities to be faster, smarter, and more cost-effective, while also addressing previous bugs. Update to the latest version and experience the enhancements firsthand!

Here are results of our latest evals run:

Evaluation Summary

Debug counts:
Total successful: 100
CORRECT: 93
INCORRECT: 7
NOT_ATTEMPTED: 1
{
"correct_rate": 0.93,
"incorrect_rate": 0.07,
"not_attempted_rate": 0.01,
"answer_rate": 0.99,
"accuracy": 0.9292929292929293,
"f1": 0.9246231155778895
}

What's Changed

  • Fix Key Error while using Deep Research by @kongacute in #1188
  • Update requirements.txt with missing langgraph dep by @namin in #1189
  • Fix Docker Build Failure: Updated combined_query in DeepRsearchSkill.run() to Handle Backslashes in F-Strings by @monolok in #1192
  • stabilize docker & frontend upgrades by @ElishaKay in #1191
  • Improved overall planning and research performance by @assafelovic in #1195
  • Added support for base_url param in create_chat_completions for OpenAI Provider by @gaurav3247 in #1198
  • Update llm.py by @olipayne in #1200
  • Fix WebSocket timeout issues by @luislofer89 in #1203
  • fix: Add missing langgraph module to requirements.txt by @hurxxxx in #1207
  • Refactor: typing cleanup by @czakop in #1187
  • add async nodriver scrapper by @ewgdg in #1170
  • Add language requirement to resource report prompt by @hurxxxx in #1208
  • Feature:eval metrics by @kga245 in #1183
  • README for feat(evals): Add SimpleQA evaluation framework and initial results by @kga245 in #1212
  • Polish up loose ends based on feedback by @ElishaKay in #1211

New Contributors

Full Changelog: v3.2.2...v3.2.3

Don't miss a new gpt-researcher release

NewReleases is sending notifications on new releases.