Agent
Prelude
Released on: 2026-03-18
- Please refer to the 7.77.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
-
APM OTLP: The
datadog.*namespaced span attributes are no longer used to construct Datadog span fields. Previously, attributes likedatadog.service,datadog.env, anddatadog.container_idwere used to directly set corresponding Datadog span fields. This functionality has been removed and the Agent now relies solely on standard OpenTelemetry semantic conventions.Exceptions:
- The
datadog.host.nameattribute continues to be respected for hostname resolution as documented at https://docs.datadoghq.com/opentelemetry/mapping/hostname/. - The
datadog.container.tag.*attributes continue to be supported for custom container tags.
The configuration option
otlp_config.traces.ignore_missing_datadog_fields(and corresponding environment variableDD_OTLP_CONFIG_IGNORE_MISSING_DATADOG_FIELDS) is deprecated and no longer has any effect. The Agent now always uses standard OTel semantic conventions.Migration: If you were using
datadog.*attributes, switch to the standard OpenTelemetry semantic conventions:datadog.service→service.namedatadog.env→deployment.environment.name(OTel 1.27+) ordeployment.environmentdatadog.version→service.versiondatadog.container_id→container.id
Who is affected: Users who explicitly set
datadog.*attributes (other thandatadog.host.nameanddatadog.container.tag.*) in their OpenTelemetry instrumentation to override default field mappings. Users relying solely on standard OpenTelemetry semantic conventions are not affected. - The
New Features
- Add
dd-procmgrd, a minimal Rust daemon for the Datadog process manager. The daemon starts, logs, and waits for a shutdown signal. It does not provide user-facing functionality. - Add a new listener based on all Custom Resource Definitions (CRDs) found on the cluster.
- Logs pipeline failover: Added automatic failover capability to prevent log loss when compression blocks pipelines. When a pipeline becomes blocked during compression, log messages are automatically routed to healthy pipelines. N router channels (one per pipeline) distribute tailers via round-robin, each with its own forwarder goroutine that handles failover independently across all pipelines. Enable with
logs_config.pipeline_failover.enabled: true(default: false). When all pipelines are blocked, backpressure is applied to prevent data loss. - The system memory check on Linux can now collect memory pressure metrics from /proc/vmstat to help detect memory pressure before OOM events occur. To enable, set
collect_memory_pressure: truein the memory check configuration. New metrics:system.mem.allocstall(withzonetag),system.mem.pgscan_direct,system.mem.pgsteal_direct,system.mem.pgscan_kswapd,system.mem.pgsteal_kswapd. - APM: Add support for span-derived primary tags in APM stats aggregation. This allows configuring tag keys via
apm_config.span_derived_primary_tagsthat will be extracted from span tags and used as additional aggregation dimensions for APM statistics. - APM: Add initial support for converting trace payload formats to the new "v1.0" format. This feature is disabled by default but can be enabled by adding the feature flag "convert-traces" to apm_config.features. It is not recommended to use this flag without direction from Datadog Support.
- Integrate the Private Action Runner into the Datadog Cluster Agent.
- The Private Action Runner (PAR) now runs in the Datadog Cluster Agent with improved identity management for Kubernetes environments. PAR identity (URN and private key) is now stored in a Kubernetes secret and shared across all DCA replicas using leader election. The leader replica handles enrollment and secret creation, while follower replicas wait for and read the shared identity. This enables multiple DCA replicas to execute PAR tasks using a single cluster identity, eliminating the need for per-replica enrollment.
- Add a Windows PowerShell example config for private action runner scripts.
- APM: Add image_volume-based library injection as an alternative to init containers and csi driver (experimental). Available only for Kubernetes 1.33+. This provides faster pod startup.
- Autodiscovery template variables are now supported in
ad.datadoghq.com/tagsandad.datadoghq.com/<container>.tagsKubernetes pod annotations. Template variables are resolved at runtime, enabling dynamic tagging based on pod and container metadata. This allows centralized tag configuration that applies to all checks, logs, and traces without hardcoding pod-specific values. - Start the Windows Private Action Runner service alongside the Agent when
private_action_runner.enabledis set indatadog.yaml. - On Windows, the private action runner binary is now included in the MSI installer and registered as the
datadog-agent-actionWindows service. The service is installed as demand-start with a dependency on the main Agent service, and its credentials and ACLs are managed alongside the other Agent services during install, upgrade, and repair. - Add
runPredefinedPowershellScriptaction to the Private Action Runner on Windows. This action allows running predefined PowerShell scripts (inline or file-based) with optional parameter templating, JSON schema parameter validation, environment variable allowlisting, configurable timeouts, and a 10 MB output limit. - On Windows, the Agent stops the private action runner service during MSI upgrades and fleet-driven stop-all operations so it is shut down alongside the Agent.
Enhancement Notes
-
The Agent's embedded Python has been upgraded from 3.13.11 to 3.13.12.
-
Add
ntp.offsetmetric withsource:intaketag to monitor clock drift using Datadog intake server timestamps. Originalntp.offsetmetric calculated from an NTP server is now taggedsource:ntp. -
As of Kubernetes version 1.33, the
EndpointAPI object has been deprecated in favor ofEndpointSlice. Autodiscovery now supports the use of anEndpointSlicelistener and provider to collect endpoint checks. To enable this feature, setkubernetes_use_endpoint_slicesto true in your Datadog Agent configuration. -
Add
bucketlabel toimage_resolution_attemptstelemetry to track gradual rollout progress. -
Added a private action runner bundle that exposes the Network Path traceroute functionality through the
getNetworkPathaction. -
Sends telemetry for synthetics tests run on the agent, including checks received, checks processed, and error counts for test configuration, traceroute, and event platform result submission.
-
Added support for two new configurations for tag-based gradual rollout in Kubernetes SSI deployments. The gradual rollout can be configured using the following parameters:
-
DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_GRADUAL_ROLLOUT_ENABLED: Whether to enable gradual rollout (default: true) -
DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_GRADUAL_ROLLOUT_CACHE_TTL: The cache TTL duration for the gradual rollout image cache (default: 1h)- This cache is used to store the mapping of mutable tags to image digest for the gradual rollout, and setting this TTL helps prevent the image resolution from becoming stale.
-
-
Agent metrics now include a
connection_typetag with a value oftcp,uds, orpipefor lib-to-agent communications. -
Automatically collect the team tag when a Kubernetes resource has a
teamlabel or annotation and explicit team tag extraction is not configured. -
Enables the agent to support built-in credentials like IRSA for AWS cloud environments.
-
Bump
go-sqllexerto v0.1.13, improving SQL obfuscation performance and fixing incorrect tokenization of multi-byte UTF-8 characters (e.g., CJK characters, full-width punctuation). -
Agents are now built with Go
1.25.7. -
NDM: Cisco SD-WAN interface metadata now includes the
is_physicalfield to distinguish physical from virtual interfaces (loopback, tunnel). cEdge interfaces also include thetypefield with the IANA interface type number. -
In the Cluster Autoscaling controller, use Kubernetes client
updateinstead ofpatch. -
On ECS Managed Instances, detect hostname from IMDS when the agent runs in daemon mode.
-
On ECS Managed Instances with daemon scheduling, the agent uses
ECS_CONTAINER_METADATA_URI_V4environment variable as a fallback signal for v4 availability. -
Expose a new metric
kube_apiserver.api_resourcethat holds thename,kind,group, andversionof all known cluster-wide (non namespaced) resources on the cluster. -
Add new DDOT feature gate 'exporter.datadogexporter.DisableAllMetricRemapping' to disable all client-side metric remapping.
-
Increases the reliability of
namespaceLabelsAsTagsandnamespaceAnnotationsAsTagsfor new pods by caching the last seen namespace metadata. -
Added a new, optional configuration setting for journald logs:
default_application_name. If set to a non-empty string, the value will replace "docker" as the default application name for contained based journald logs. If set to an empty string, the application name will be determined by the systemd journal fields, like all non-container based journald logs. -
Simplified location permission detection on MacOS by removing the first detection with polling at the time of app startup. The permission detection now happens only at the time of WLAN data collection.
-
Use config flag 'request_location_permission' in WLAN config to gate location permission request on MacOS
-
Added the
enable_otlp_container_tags_v2feature flag, which may reduce the Agent's outgoing traffic when ingesting OTLP traces from containerized applications.However, the flag introduces some breaking changes:
- container tags on the new spans can no longer be queried as span attributes (with
@); - using the
k8s.pod.uidattribute as a fallback container ID is no longer supported; - disabling the infraattributes processor in DDOT trace pipelines will prevent automatic container tag detection.
- container tags on the new spans can no longer be queried as span attributes (with
-
The
datadog.yamlconfiguration file now includes a commented-outprivate_action_runnersection on all platforms. -
The Private Action Runner now supports Datadog's secret management features. It can now resolve secrets using the
ENC[...]notation in configuration files, supporting all secret backends viasecret_backend_typeandsecret_backend_configsettings. -
Private Action Runner now supports running as a Windows service via Service Control Manager (SCM).
-
Bumped the Security Agent policies to v0.77.0
-
SNMP interface metadata now includes
type(IF-MIB ifType) andis_physicalfields. Theis_physicalfield is set to true for physical ethernet interface types (ethernetCsmacd, fastEther, fastEtherFX, gigabitEthernet). -
Add support for unconnected UDP sockets in the SNMP corecheck. Automatically fallback to unconnected UDP sockets if the connected UDP socket times out.
-
APM: Added a new health metric,
datadog.trace_agent.receiver.payload_timeout, to track incoming trace payload timeouts caused by client connection closures or middleware timeouts. -
Upgraded the Datadog Agent Windows installer from WiX 3 to WiX 5.
-
Reports telemetry from the Windows Injector, enabled by default. Disable this feature by setting
injector.enable_telemetry=falseinsystem-probe.yamlwhen running system-probe. -
Add Windows version information to the Private Action Runner executable. The version info is now visible in Windows Explorer file properties.
-
Added a telemetry metric to track pending events in workloadmeta: "workloadmeta.pending_event_bundles".
-
Avoid blocking workloadmeta collectors when streaming events to remote agents.
Deprecation Notes
- GPUm: renamed metrics gpu.process.{encoder,decoder}_utilization to gpu.process.{encoder,decoder}_active for consistency with the 'active' suffix in the rest of the GPUm metrics
Security Notes
- Oracle check: PDB names in
ALTER SESSION SET CONTAINERstatements are now properly quoted to prevent SQL injection. - The Jetson integration now validates the
tegrastats_pathconfiguration option to prevent command injection. The path must be absolute and cannot contain shell metacharacters or whitespace.
Bug Fixes
- APM: Fix panic that could occur when decoding malformed v1.0 trace payloads.
- APM: Correctly mark traces as probability sampled when using the trace V1 format. APM: Fix issue where v1 trace writer might not flush traces during an agent shutdown.
- The container and process discovery checks are now disabled when the process check is enabled for service discovery.
- Detect correct launch type for ECS Managed Instances when running in daemon mode.
- Fixed a minor but persistent memory leak in the logs endpoint diagnostic behavior.
- Fixes an issue where
agent check --flarecreated the checks directory with 0000 permissions, preventing check output files from being written. The directory is now created with 0750 permissions. - Changed integration log file behavior to delete and recreate instead of truncating. This should help prevent duplicate and missing logs from integrations.
- Fixes using ReplicaSet creation time for rollout duration, because rollbacks reuse existing ReplicaSets, causing durations to show as hours/days instead of the actual rollback time. The fix tracks revision annotation changes and resets the start time to now when a rollback is detected.
- Oracle check: Fix a bug where custom queries accumulated metrics across iterations, causing metrics from earlier queries to be re-sent with each subsequent query in the same check run.
- Oracle check: Fix potential panic in
sendMetricwhen the sender or metric function cannot be resolved. - Oracle check: Fix custom query error accumulation so that type errors from earlier queries are no longer silently discarded.
- Oracle check: Report a clear error when a custom query returns a NULL value for a metric column instead of an "UNKNOWN" type error message.
- Oracle check: Detect column count mismatches in both directions (too many or too few) between custom query results and configured column mappings.
- Oracle check: Remove redundant
GetSendercall in custom query handling in favor of the existingcommithelper. - Oracle check: Replace per-call map allocations with switch statements in custom query metric helpers for improved performance.
- Fixed a bug where log lines exactly at the
logs_config.max_message_size_byteslimit (default 900KB) were incorrectly marked as truncated. This caused the...TRUNCATED...marker to appear in logs that fit within the size limit, and incorrectly marked the subsequent log line as a truncated remainder. Additionally, improved truncation detection by extending the FrameMatcher interface to explicitly signal when content is truncated, ensuring consistent truncation state across the framer and handler components. - Fixes a bug in the admission controller webhook that allowed admission to re-run for pods that already had APM injection in image-volume mode.
- Refined location permission checks to avoid unnecessary system prompt. Added prevention for possible installation conflict between per-user and system-wide installations.
- Fix data race in opentelemetry-mapping-go/inframetadata.Reporter which could cause a crash with error message "concurrent map iteration and map write".
- OTLP logs now support array type attributes. Arrays containing primitive values or nested maps are now correctly preserved in the log output.
- Align Private Action Runner configuration keys and log guidance to the
private_action_runner.*snake-case names. - Fix the private action runner PowerShell example config not being installed on Windows. The file is now correctly placed at
C:\ProgramData\Datadog\private-action-runner\powershell-script-config.yaml. - Fix process collection to detect command line changes for processes with the same PID and creation time by hashing the command line.
- Fixed a bug where tailing UTF-16 encoded log files (UTF-16-LE or UTF-16-BE) could produce mojibake (garbled text) when log lines exceeded the configured
logs_config.max_message_size_byteslimit (default 900KB). The truncation was performed at the byte level without respecting 2-byte UTF-16 character boundaries, which could split a character in half and produce Unicode replacement characters (U+FFFD) after decoding. The framer now aligns the truncation limit to a 2-byte boundary for UTF-16 encodings, ensuring that truncated frames always contain valid UTF-16 data.
Other Notes
- Add metrics origins for Pinot integration.
Datadog Cluster Agent
Prelude
Released on: 2026-03-18 Pinned to datadog-agent v7.77.0: CHANGELOG.
New Features
- Add APM tracing instrumentation to the Datadog Cluster Agent for improved observability and debugging in production environments. When enabled, the Cluster Agent emits APM traces for cluster check dispatching and rebalancing operations, surfacing patch failures and rebalancing decisions as span tags.
Enhancement Notes
- Reduce admission controller downtime during certificate rotation.
- Add the ability to collect NodeClasses EKS Auto Mode custom resources (
eks.amazonaws.comAPI group) by default. - Experimental: Adds support for collecting force-deleted pods in the orchestrator check using
orchestrator_explorer.terminated_pods_improved.enabled.