ddtrace 4.4.0rc2 on Python PyPI

Estimated end-of-life date, accurate to within three months: 05-2027
See the support level definitions for more information.

New Features

LLM Observability
- Adds support for class-based evaluators in LLM Observability. Users can now define custom evaluators by subclassing the BaseEvaluator class, providing more flexibility and structure for implementing evaluation logic. The EvaluatorContext stores the context of the evaluation, including the dataset record and span information. Additionally, class-based summary evaluators are supported via BaseSummaryEvaluator, which receives a SummaryEvaluatorContext containing aggregated inputs, outputs, expected outputs, and per-row evaluation results.
logging
- Adds a new environment variable DD_TRACE_LOG_LEVEL to control the ddtrace logger level, following the log levels available in the logging module.

Bug Fixes

AAP
- Fixes an issue where agent-based samplers could interfere with Standalone App and API Protection. In standalone mode, a trace need to be sent every minute to keep the service enabled, agent-based samplers with low sample rates could reject traces before the custom App and API Protection sampler was evaluated.
aws_lambda
- This fix resolves an issue Fhere user-defined SIGALRM handlers were not restored after TimeoutChannel cleanup, causing custom timeout handlers to stop working after the first Lambda invocation.
exception replay
- Fix a gevent support issue that could cause an exception when Exception Replay tries to figure out if a frame belongs to user code for capturing.
litellm
- This fix resolves an issue where litellm>=1.74.15 wrapped router streaming responses in FallbackStreamWrapper (introduced for mid-stream fallback support) that caused an AttributeError when attempting to access the .handler attribute. The integration now gracefully handles both the original response format and wrapped responses by falling back to ddtrace's own stream wrapping when needed.
profiling
- A bug where non-pushed samples could leak data to subsequent samples has been fixed.
- A bug where asyncio task stacks could contain duplicated frames when the task was on-CPU is now fixed. The stack now correctly shows each frame only once.
- The stack Profiler now correctly resets thread, task, and greenlet information after a fork, preventing stale data from the parent process from affecting profiling in child processes.
- Fixed crash in lock profiler when stack traces are too shallow (less than 4 frames). This could occur during interpreter bootstrap, atexit callbacks, or other edge cases. In these rare scenarios, locks may now appear with location "unknown:0" in profiling data instead of causing application crashes.
- Fixed an issue that causes greenlets to misbehave when gevent.joinall is called.
- This fix resolves a crash occurring when forking while using the memory profiler.
LLM Observability
- This fix resolves an issue where the Pydantic AI integration was not properly tracing the StreamedRunResult.stream_responses() method which was introduced in pydantic-ai==0.8.1. This was leading to agent spans not being finished.
- This fix addresses an issue where the evaluators argument type for LLMObs.experiment was overly constrained due to the use of an invariant List type. The argument now uses the covariant Sequence type, allowing users to pass in a list of evaluator functions with narrower return type.
- Fixes an issue where OpenAI spans would show model_name: "None" instead of falling back to the request model if the API response returns a None model field. The model name now properly falls back to openai.request.model or "unknown_model" if both are unavailable.