Estimated end-of-life date, accurate to within three months: 05-2027
See the support level definitions for more information.
Upgrade Notes
- LLM Observability
- Experiments spans now contain config from the experiment initialization, allowing for searching of relevant spans using the experiment config.
- Experiments spans now contain the tags from the dataset records, allowing for searching of relevant spans using the dataset record tags.
Deprecation Notes
- tracing
- The type annotation for
Span.parent_idwill change fromOptional[int]tointin v5.0.0.
- The type annotation for
New Features
- azure-api-management
- This introduces inferred proxy support for Azure API Management.
- Stats computation
- Enable stats computation by default for python 3.14 and above.
- AI Guard
- Adds SDS (Sensitive Data Scanner) findings to AI Guard spans, enabling visibility into sensitive data detected in LLM inputs and outputs.
- LLM Observability
-
Experiments now report their execution status to the backend. Status transitions to
runningwhen execution starts,completedon success,failedwhen tasks or evaluators error withraise_errors=False, andinterruptedwhen the experiment is stopped by an exception. #16713 -
Adds
LLMObs.publish_evaluator()to sync a locally-definedLLMJudgeevaluator to the Datadog UI as a custom LLM-as-Judge evaluation. -
Adds support for DeepEval evaluations in LLM Observability Experiments by allowing users to pass a DeepEval evaluation (which either inherents from
BaseMetricorBaseConversationalMetric) in an LLM Obs Experiment.Example:
from deepeval.metrics import GEval from deepeval.test_case import LLMTestCaseParams from ddtrace.llmobs import LLMObs correctness_metric = GEval( name="Correctness", criteria="Determine whether the actual output is factually correct based on the expected output.", evaluation_steps=[ "Check whether the facts in 'actual output' contradicts any facts in 'expected output'", "You should also heavily penalize omission of detail", "Vague language, or contradicting OPINIONS, are OK" ], evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT], async_mode=True ) dataset = LLMObs.create_dataset( dataset_name="<DATASET_NAME>", description="<DATASET_DESCRIPTION>", records=[RECORD_1, RECORD_2, RECORD_3, ...] ) def my_task(input_data, config): return input_data["output"] def my_summary_evaluator(inputs, outputs, expected_outputs, evaluators_results): return evaluators_results["Correctness"].count(True) experiment = LLMObs.experiment( name="<EXPERIMENT_NAME>", task=my_task, dataset=dataset, evaluators=[correctness_metric], summary_evaluators=[my_summary_evaluator], # optional, used to summarize the experiment results description="<EXPERIMENT_DESCRIPTION>." ) result = experiment.run() -
adds experiment summary logging after
run()with row count, run count, per-evaluator stats, and error counts. -
adds
max_retriesandretry_delayparameters toexperiment.run()for retrying failed tasks and evaluators. Example:experiment.run(max_retries=3, retry_delay=lambda attempt: 2 ** attempt). -
This introduces
LLMObs.get_prompt()to retrieve managed prompts from Datadog's Prompt Registry. The method returns aManagedPromptobject with aformat()
method for variable substitution. Prompt updates propagate to running applications within the cache TTL (default: 60 seconds).
Use withannotation_contextorannotateto correlate prompts with LLM spans:prompt = LLMObs.get_prompt("greeting") variables = {"user": "Alice"} with LLMObs.annotation_context(prompt=prompt.to_annotation_dict(**variables)): openai.chat.completions.create(messages=prompt.format(**variables))
-
experiments propagate canonical_ids from dataset records to the corresponding experiments span when present. The canonical_ids are only guaranteed to be available after calling
pull_dataset. -
LLMObs.create_datasetsupports abulk_uploadparameter to control data uploading behavior. BothLLMObs.create_datasetandLLMObs.create_dataset_from_csvsupports users specifying thededuplicateparameter. -
Subset of dataset records can now be pulled with tags by using the
tagsargument toLLMObs.pull_dataset, provided in a list of strings of key value pairs:LLMObs.pull_dataset(dataset_name="my-dataset", tags=["env:prod", "version:1.0"])
-
Bug Fixes
- LLM Observability
- Fix data duplication issue when uploading > 5MB datasets via
LLMObs.create_dataset.
- Fix data duplication issue when uploading > 5MB datasets via
- ai_guard
- Fix TypeError while processing failed AI Guard responses, leading to overriding the original error.
- openai_agents
- Fixes an
AttributeErroronopenai-agents >= 0.8.0caused by the removal ofAgentRunner._run_single_turn.
- Fixes an
- profiling
- A bug which could prevent Profiling from being enabled when the library is installed through Single Step Instrumentation was fixed.
- This fixes an issue where the profiler was patching the
geventmodule unnecessarily even when the profiler was not enabled. - A bug that would cause certain function names to be displayed as
<module>in flame graphs has been fixed. - Fix lock contention in the profiler's greenlet stack sampler that could cause connection pool exhaustion in gevent-based applications (e.g. gunicorn + gevent + psycopg2). #16657
- This fix resolves an issue where the lock profiler's wrapper class did not support PEP 604 type union syntax (e.g.,
asyncio.Condition | None). This was causing aTypeErrorat import time for libraries such as kopf that use union type annotations at class definition time.
- data_streams
- Add
kafka_cluster_idtag to Kafka offset/backlog tracking for confluent-kafka. Previously, cluster ID was only included in DSM checkpoint edge tags (produce/consume) but missing from offset commit and produce offset backlogs. This ensures correct attribution of backlog data to specific Kafka clusters when multiple clusters share topic names.
- Add
- AAP
- Fixes a memory corruption issue where concurrent calls to the WAF on the same request context from multiple threads (e.g. an asyncio event loop and a thread pool executor inheriting the same context via
contextvars) could cause use-after-free or double-free crashes (SIGSEGV) insidelibddwaf. A per-context lock now serializes WAF calls on the same context.
- Fixes a memory corruption issue where concurrent calls to the WAF on the same request context from multiple threads (e.g. an asyncio event loop and a thread pool executor inheriting the same context via
- tracing
- Avoid pickling wrappers in
ddtrace.internal.wrapping.context.BaseWrappingContext.
- Avoid pickling wrappers in
- CI Visibility
- Fixed an incompatibility with
pytest-htmland other third-party reporting plugins caused by the ddtrace pytest plugin using a non-standarddd_retrytest outcome for retry attempts. The outcome is now set torerun, which is the standard value used bypytest-rerunfailuresand recognized by reporting plugins.
- Fixed an incompatibility with
- dynamic instrumentation
- Fixes a
RuntimeError: generator didn't yieldin the Symbol DB remote config subscriber when the process has no writable temporary directory.
- Fixes a
- celery
- Propagate distributed tracing headers for tasks that are not registered locally so traces link correctly across workers. #16662
- Fix for a potential race condition affecting internal periodic worker threads that could have caused a
RuntimeErrorduring forks. - Add a timeout to Unix socket connections to prevent thread I/O hangs during pre-fork shutdown.
Other Changes
- profiling
- reduces code provenance CPU overhead when using fork-based frameworks like gunicorn and uWSGI.
- LLM Observability
- Exports
LLMJudge,BooleanStructuredOutput,ScoreStructuredOutput, andCategoricalStructuredOutputto the publicddtrace.llmobsmodule level.
- Exports