1.11.0 (2025-09-02)
Features Added
- Added support for user-supplied tags in the
evaluate
function. Tags are key-value pairs that can be used for experiment tracking, A/B testing, filtering, and organizing evaluation runs. The function accepts atags
parameter. - Added support for user-supplied TokenCredentials with LLM based evaluators.
- Enhanced
GroundednessEvaluator
to support AI agent evaluation with tool calls. The evaluator now accepts agent response data containing tool calls and can extract context fromfile_search
tool results for groundedness assessment. This enables evaluation of AI agents that use tools to retrieve information and generate responses. Note: Agent groundedness evaluation is currently supported only when thefile_search
tool is used. - Added
language
parameter toRedTeam
class for multilingual red team scanning support. The parameter accepts values fromSupportedLanguages
enum including English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, and Simplified Chinese, enabling red team attacks to be generated and conducted in multiple languages. - Added support for IndirectAttack and UngroundedAttributes risk categories in
RedTeam
scanning. These new risk categories expand red team capabilities to detect cross-platform indirect attacks and evaluate ungrounded inferences about human attributes including emotional state and protected class information.
Bugs Fixed
- Fixed issue where evaluation results were not properly aligned with input data, leading to incorrect metrics being reported.
Other Changes
- Deprecating
AdversarialSimulator
in favor of the AI Red Teaming Agent.AdversarialSimulator
will be removed in the next minor release. - Moved retry configuration constants (
MAX_RETRY_ATTEMPTS
,MAX_RETRY_WAIT_SECONDS
,MIN_RETRY_WAIT_SECONDS
) fromRedTeam
class to newRetryManager
class for better code organization and configurability.