github confident-ai/deepeval v3.0
LLM Evals - v3.0

latest releases: v2.4.8, v3.3.5, v3.2.6...
3 months ago

πŸš€ DeepEval v3.0 β€” Evaluate Any LLM Workflow, Anywhere

We’re excited to introduce DeepEval v3.0, a major milestone that transforms how you evaluate LLM applications β€” from complex multi-step agents to simple prompt chains. This release brings component-level granularity, production-ready observability, and simulation tools to empower devs building modern AI systems.


πŸ” Component-Level Evaluation for Agentic Workflows

You can now apply DeepEval metrics to any step of your LLM workflow β€” tools, memories, retrievers, generators β€” and monitor them in both development and production.

  • Evaluate individual function calls, not just final outputs
  • Works with any framework or custom agent logic
  • Real-time evaluation in production using observe()
  • Track sub-component performance over time

πŸ“˜ Learn more β†’


🧠 Conversation Simulation

Automatically simulate realistic multi-turn conversations to test your chatbots and agents.

  • Define model goals and user behavior
  • Generate labeled conversations at scale
  • Use DeepEval metrics to assess response quality
  • Customize turn count, persona types, and more

πŸ“˜ Try the simulator β†’


🧬 Generate Goldens from Goldens

Bootstrapping eval datasets just got easier. Now you can exponentially expand your test cases using LLM-generated variants of existing goldens.

  • Transform goldens into many meaningful test cases
  • Preserve structure while diversifying content
  • Control tone, complexity, length, and more

πŸ“˜ Read the guide β†’


πŸ”’ Red Teaming Moved to DeepTeam

All red teaming functionality now lives in its own focused project: DeepTeam. DeepTeam is built for LLM security β€” adversarial testing, attack generation, and vulnerability discovery.


πŸ› οΈ Install or Upgrade

pip install deepeval --upgrade

🧠 Why v3.0 Matters

DeepEval v3.0 is more than an evaluation framework β€” it's a foundation for LLM observability. Whether you're debugging agents, simulating conversations, or continuously monitoring production performance, DeepEval now meets you wherever your LLM logic runs.

Ready to explore?
πŸ“š Full docs at deepeval.com β†’

Don't miss a new deepeval release

NewReleases is sending notifications on new releases.