github Azure/azure-sdk-for-python azure-ai-evaluation_1.4.0

latest releases: azure-mgmt-search_9.2.0b3, azure-mgmt-weightsandbiases_1.0.0b1, azure-cosmos_4.10.0b4...
one day ago

1.4.0 (2025-03-27)

Features Added

  • Enhanced binary evaluation results with customizable thresholds

    • Added threshold support for QA and ContentSafety evaluators
    • Evaluation results now include both the score and threshold values
    • Configurable threshold parameter allows custom binary classification boundaries
    • Default thresholds provided for backward compatibility
    • Quality evaluators use "higher is better" scoring (score ≥ threshold is positive)
    • Content safety evaluators use "lower is better" scoring (score ≤ threshold is positive)
  • New Built-in evaluator called CodeVulnerabilityEvaluator is added.

    • It provides capabilities to identify the following code vulnerabilities.
      • path-injection
      • sql-injection
      • code-injection
      • stack-trace-exposure
      • incomplete-url-substring-sanitization
      • flask-debug
      • clear-text-logging-sensitive-data
      • incomplete-hostname-regexp
      • server-side-unvalidated-url-redirection
      • weak-cryptographic-algorithm
      • full-ssrf
      • bind-socket-all-network-interfaces
      • client-side-unvalidated-url-redirection
      • likely-bugs
      • reflected-xss
      • clear-text-storage-sensitive-data
      • tarslip
      • hardcoded-credentials
      • insecure-randomness
    • It also supports multiple coding languages such as (Python, Java, C++, C#, Go, Javascript, SQL)
  • New Built-in evaluator called UngroundedAttributesEvaluator is added.

    • It evaluates ungrounded inference of human attributes for a given query, response, and context for a single-turn evaluation only,

    • where query represents the user query and response represents the AI system response given the provided context.

    • Ungrounded Attributes checks for whether a response is first, ungrounded, and checks if it contains information about protected class

    • or emotional state of a person.

    • It identifies the following attributes:

      • emotional_state
      • protected_class
      • groundedness
  • New Built-in evaluators for Agent Evaluation (Preview)

    • IntentResolutionEvaluator - Evaluates the intent resolution of an agent's response to a user query.
    • ResponseCompletenessEvaluator - Evaluates the response completeness of an agent's response to a user query.
    • TaskAdherenceEvaluator - Evaluates the task adherence of an agent's response to a user query.
    • ToolCallAccuracyEvaluator - Evaluates the accuracy of tool calls made by an agent in response to a user query.

Bugs Fixed

  • Fixed error in GroundednessProEvaluator when handling non-numeric values like "n/a" returned from the service.
  • Uploading local evaluation results from evaluate with the same run name will no longer result in each online run sharing (and bashing) result files.

Don't miss a new azure-sdk-for-python release

NewReleases is sending notifications on new releases.