Azure/azure-sdk-for-python azure-ai-evaluation

1.4.0 (2025-03-27)

Features Added

Enhanced binary evaluation results with customizable thresholds
- Added threshold support for QA and ContentSafety evaluators
- Evaluation results now include both the score and threshold values
- Configurable threshold parameter allows custom binary classification boundaries
- Default thresholds provided for backward compatibility
- Quality evaluators use "higher is better" scoring (score ≥ threshold is positive)
- Content safety evaluators use "lower is better" scoring (score ≤ threshold is positive)
New Built-in evaluator called CodeVulnerabilityEvaluator is added.
- It provides capabilities to identify the following code vulnerabilities.
  - path-injection
  - sql-injection
  - code-injection
  - stack-trace-exposure
  - incomplete-url-substring-sanitization
  - flask-debug
  - clear-text-logging-sensitive-data
  - incomplete-hostname-regexp
  - server-side-unvalidated-url-redirection
  - weak-cryptographic-algorithm
  - full-ssrf
  - bind-socket-all-network-interfaces
  - client-side-unvalidated-url-redirection
  - likely-bugs
  - reflected-xss
  - clear-text-storage-sensitive-data
  - tarslip
  - hardcoded-credentials
  - insecure-randomness
- It also supports multiple coding languages such as (Python, Java, C++, C#, Go, Javascript, SQL)
New Built-in evaluator called UngroundedAttributesEvaluator is added.
- It evaluates ungrounded inference of human attributes for a given query, response, and context for a single-turn evaluation only,
- where query represents the user query and response represents the AI system response given the provided context.
- Ungrounded Attributes checks for whether a response is first, ungrounded, and checks if it contains information about protected class
- or emotional state of a person.
- It identifies the following attributes:
  - emotional_state
  - protected_class
  - groundedness
New Built-in evaluators for Agent Evaluation (Preview)
- IntentResolutionEvaluator - Evaluates the intent resolution of an agent's response to a user query.
- ResponseCompletenessEvaluator - Evaluates the response completeness of an agent's response to a user query.
- TaskAdherenceEvaluator - Evaluates the task adherence of an agent's response to a user query.
- ToolCallAccuracyEvaluator - Evaluates the accuracy of tool calls made by an agent in response to a user query.

Bugs Fixed

Fixed error in GroundednessProEvaluator when handling non-numeric values like "n/a" returned from the service.
Uploading local evaluation results from evaluate with the same run name will no longer result in each online run sharing (and bashing) result files.

Azure/azure-sdk-for-python azure-ai-evaluation_1.4.0 on GitHub

1.4.0 (2025-03-27)

Features Added

Bugs Fixed

Azure/azure-sdk-for-python azure-ai-evaluation_1.4.0
on GitHub