github DataDog/dd-trace-py v4.6.0rc2
4.6.0rc2

latest release: v4.5.1
pre-release6 hours ago

New Features

  • AI Guard: Adds SDS (Sensitive Data Scanner) findings to AI Guard spans, enabling visibility into sensitive data detected in LLM inputs and outputs.
  • LLM Observability: Adds support for DeepEval evaluations in LLM Observability Experiments by allowing users to pass a DeepEval evaluation (which either inherents from BaseMetric or BaseConversationalMetric) in an LLM Obs Experiment.

    Example:

    from deepeval.metrics import GEval
    from deepeval.test_case import LLMTestCaseParams
    
    from ddtrace.llmobs import LLMObs
    
    correctness_metric = GEval(
        name="Correctness",
        criteria="Determine whether the actual output is factually correct based on the expected output.",
        evaluation_steps=[
            "Check whether the facts in 'actual output' contradicts any facts in 'expected output'",
            "You should also heavily penalize omission of detail",
            "Vague language, or contradicting OPINIONS, are OK"
        ],
        evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
        async_mode=True
    )
    
    dataset = LLMObs.create_dataset(
        dataset_name="<DATASET_NAME>",
        description="<DATASET_DESCRIPTION>",
        records=[RECORD_1, RECORD_2, RECORD_3, ...]
    )
    
    def my_task(input_data, config):
        return input_data["output"]
    
    def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results):
        return evaluators_results["Correctness"].count(True)
    
    experiment = LLMObs.experiment(
        name="<EXPERIMENT_NAME>",
        task=my_task, 
        dataset=dataset,
        evaluators=[correctness_metric],
        summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results
        description="<EXPERIMENT_DESCRIPTION>."
    )
    
    result = experiment.run()
    
  • LLM Observability: adds experiment summary logging after run() with row count, run count, per-evaluator stats, and error counts.
  • LLM Observability: adds max_retries and retry_delay parameters to experiment.run() for retrying failed tasks and evaluators. Example: experiment.run(max_retries=3, retry_delay=lambda attempt: 2 ** attempt).

Bug Fixes

  • AAP: Fixes a memory corruption issue where concurrent calls to the WAF on the same request context from multiple threads (e.g. an asyncio event loop and a thread pool executor inheriting the same context via contextvars) could cause use-after-free or double-free crashes (SIGSEGV) inside libddwaf. A per-context lock now serializes WAF calls on the same context.
  • tracing: Avoid pickling wrappers in ddtrace.internal.wrapping.context.BaseWrappingContext.
  • CI Visibility: Fixed an incompatibility with pytest-html and other third-party reporting plugins caused by the ddtrace pytest plugin using a non-standard dd_retry test outcome for retry attempts. The outcome is now set to rerun, which is the standard value used by pytest-rerunfailures and recognized by reporting plugins.
  • dynamic instrumentation: Fixes a RuntimeError: generator didn't yield in the Symbol DB remote config subscriber when the process has no writable temporary directory.
  • profiling: A bug that would cause certain function names to be displayed as <module> in flame graphs has been fixed.

Don't miss a new dd-trace-py release

NewReleases is sending notifications on new releases.