โ๏ธ New Features
DeepEval's 3.2.6 release focuses on single-vs multi-turn use cases in datasets!
๐งฉ Support for Single-Turn and Multi-Turn Datasets
- Single-turn datasets: Simple
input โ output
pairs for one-off prompt testing. - Multi-turn datasets: Full conversation flows with alternating user/assistant turns. Perfect for simulating real chat interactions.
DeepEval now automatically detects whether a dataset is single-turn or multi-turn based on structure and routes to the appropriate evaluation logic.
๐งช Conversational Goldens
Introduced a new concept: conversational goldens, which contains scenario, (and optionally expected_outcome
) but not things like input and expected output as with single-turn use cases..
โ Improvements
- Smarter dataset evaluation routing: Whether single-turn or multi-turn, DeepEval figures it out and builds test cases accordingly.
- Improved multi-turn context preservation: Each conversational turn is maintained during evaluation, giving more accurate multi-turn metrics.
This release is setting the stage for future multi-turn use cases.