🚀 DeepEval v3.0 — Evaluate Any LLM Workflow, Anywhere

We’re excited to introduce DeepEval v3.0, a major milestone that transforms how you evaluate LLM applications — from complex multi-step agents to simple prompt chains. This release brings component-level granularity, production-ready observability, and simulation tools to empower devs building modern AI systems.

🔍 Component-Level Evaluation for Agentic Workflows

You can now apply DeepEval metrics to any step of your LLM workflow — tools, memories, retrievers, generators — and monitor them in both development and production.

Evaluate individual function calls, not just final outputs
Works with any framework or custom agent logic
Real-time evaluation in production using observe()
Track sub-component performance over time

📘 Learn more →

🧠 Conversation Simulation

Automatically simulate realistic multi-turn conversations to test your chatbots and agents.

Define model goals and user behavior
Generate labeled conversations at scale
Use DeepEval metrics to assess response quality
Customize turn count, persona types, and more

📘 Try the simulator →

🧬 Generate Goldens from Goldens

Bootstrapping eval datasets just got easier. Now you can exponentially expand your test cases using LLM-generated variants of existing goldens.

Transform goldens into many meaningful test cases
Preserve structure while diversifying content
Control tone, complexity, length, and more

📘 Read the guide →

🔒 Red Teaming Moved to DeepTeam

All red teaming functionality now lives in its own focused project: DeepTeam. DeepTeam is built for LLM security — adversarial testing, attack generation, and vulnerability discovery.

🛠️ Install or Upgrade

pip install deepeval --upgrade

🧠 Why v3.0 Matters

DeepEval v3.0 is more than an evaluation framework — it's a foundation for LLM observability. Whether you're debugging agents, simulating conversations, or continuously monitoring production performance, DeepEval now meets you wherever your LLM logic runs.

Ready to explore?
📚 Full docs at deepeval.com →

confident-ai/deepeval v3.0 LLM Evals - v3.0 on GitHub