Agent
Prelude
Release on: 2022-11-02
- Please refer to the 7.40.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
-
Starting Agent 7.40, the Agent will fail to start when unable to determine hostname instead of silently using unrelevant hostname (usually, a container id). Hostname resolution is key to many features and failure to determine hostname means that the Agent is not configured properly. This change mostly affects Agents running in containerized environments as we cannot rely on OS hostname.
-
Universal Service Monitoring now requires a Linux kernel version of 4.14 or greater.
New Features
- The Agent RPM package now supports Amazon Linux 2022 and Fedora 30+ without requiring the installation of the additional
libxcrypt-compat
system package. - Add support for CAPI metadata and DCA tags collection in PCF containers.
- Add a username and password dialog window to the Windows Installer
- APM: DogStatsD data can now be proxied through the "/dogstatsd/v1/proxy" endpoint over UDS or UDP. If a socket is provided with dogstatsd_socket, the proxy will default to proxying over UDS. Otherwise, UDP will be used. See https://docs.datadoghq.com/developers/dogstatsd#setup for configuration details.
- Cloud Workload Security now has Agent version constraints for Macros in SECL expressions.
- Added the "helm_values_as_tags" configuration option in the Helm check. It allows users to collect helm values from a Helm release and use them as tags to attach to the metrics and events emitted by the Helm check.
- Enable the new DogStatsD no-aggregation pipeline, capable of processing metrics with timestamps. Set dogstatsd_no_aggregation_pipeline to false to disable it.
- Adds ability to identify the interpreter of a script inside a script via the shebang. Example rule would be exec.interpreter.file.name == ~"python*". This feature is currently limited to one layer of nested script. For example, a python script in a shell script will be caught, but a perl script inside a python script inside a shell script will not be caught.
Enhancement Notes
-
JMXFetch now supports ZGC Cycles and ZGC Pauses beans support out of the box.
-
Adds new aws.lambda.enhanced.post_runtime_duration metric for AWS Lambda functions. This gauge metric measures the elapsed milliseconds from when the function returns the response to when the extensions finishes. This includes performing activities like sending telemetry data to a preferred destination after the function's response is returned. Note that aws.lambda.enhanced.duration is equivalent to the sum of aws.lambda.enhanced.runtime_duration and aws.lambda.enhanced.post_runtime_duration.
-
Add the flare command to the Cloud Foundry cluster agent to improve support experience.
-
Add
CreateContainerError
andInvalidImageName
to waiting reasons forkubernetes_state.container.status_report.count.waiting
in the Kubernetes State Core check. -
[netflow] Ephemeral Port Rollup
-
APM: A warning is now logged when the agent is under heavy load.
-
APM: The "http.status_code" tag is now supported as a numeric value too when computing APM trace stats. If set as both a string and a numeric value, the numeric value takes precedence and the string value is ignored.
-
APM: Add support for cgroup2 via UDS.
-
A new config option,
logs_config.file_wildcard_selection_mode
, allows you to configure how log wildcard file matches are prioritized if the number of matches exceedslogs_config.open_files_limit
.The option defaults to
by_name
which is the previous behavior. The new option isby_modification_time
which prioritizes more recently modified files, but using it can result in slower performance compared to usingby_name
. -
Agents are now built with Go 1.18.7. This version of Go brings changes to the garbage collection runtime that may change the Agent's memory usage. In internal testing, the RSS of Agent processes showed a minor increase of a few MiB, while CPU usage remained consistent. Reducing the value of
GOGC
as described in the Go documentation was effective in reducing the memory usage at a modest cost in CPU usage. -
KSM Core check: Add the
helm_chart
tag automatically from the standard helm labelhelm.sh/chart
. -
Helm check: Add a
helm_chart
tag, equivalent to the standard helm labelhelm.sh/chart
(see https://helm.sh/docs/chart_best_practices/labels/). -
The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.60.0. In particular, this drops support for consuming OTLP/JSON v0.15.0 or below payloads.
-
Improve CCCache performance on cache miss, significantly reducing the number of API calls to the CAPI.
-
Add more flags to increase control over the CCCache, such as refresh_on_cache_miss, sidecars_tags, and isolation_segments_tags flags under cluster_agent properties.
-
Windows: Add a config option to control how often the agent refreshes performance counters.
-
Introduces an unbundle_events config to the docker integration. When set to true, Docker events are no longer bundled together by image name, and instead generate separate Datadog events.
-
Introduces an unbundle_events config to the kubernetes_apiserver integration. When set to true, Kubernetes events are no longer bundled together by InvolvedObject, and instead generate separate Datadog events.
-
On Windows the Agent now uses high-resolution icon where possible. The smaller resolution icons have been resampled for better visibility.
Known Issues
- APM: OTLP Ingest: resource attributes such as service.name are correctly picked up by spans.
Deprecation Notes
- The following Windows Agent container versions are removed: 1909, 2004, and 20H2.
Bug Fixes
- Add the device field to the
MetricPayload
to ensure the device tag is properly handled by the backend. - APM: Revised support for tracer single span sampling. See datadog-agent/pull/13461.
- Fixed a problem that could trigger in the containerd collector when fetching containers from multiple namespaces.
- Fixed a crash when
dogstatsd_metrics_stats_enable
is true - Fix a bug in Autodiscovery preventing the Agent to correctly schedule checks or logs configurations on newly created PODs during a StatefulSet rollout.
- The included
aerospike
Python package is now correctly built against the embedded OpenSSL and thus the Aerospike integration can be successfully used on RHEL/CentOS. - Fix configresolver to continue parsing when a null value is found.
- Fixed issue with CPU count on MacOS
- The container CPU limit that is reported by docker and container checks on ECS was not defaulting to the task limit when no CPU limit is set at container level.
- Fix potential panic when removing a service that the log agent is currently tailing.
- On SUSE, fixes the permissions declared in the package list of the RPM package. This was causing package conflicts between the datadog-agent package and other packages with files in
/usr/lib/systemd/system
. - Fixed a resource leak in the helm check.
- Fix golang performance counter initialization errors when counters are not available during agent/check init time. Checks now retry the counter initilization on each interval.
- [snmp] Cache snmp dynamic tags from devices
Other Notes
- JMXFetch upgraded to 0.47.1 https://github.com/DataDog/jmxfetch/releases/0.47.1
- The
logs_config.cca_in_ad
feature flag now defaults to true. This selects updated codepaths in Autodiscovery and the Logs Agent. No behavior change is expected. Please report any behavior that is "fixed" by setting this flag to false.
Datadog Cluster Agent
New Features
- Experimental: The Datadog Admission Controller can inject the Python APM library into Kubernetes containers for auto-instrumentation.
- The orchestrator check is now able to discover resources to collect based on API groups available in the Kubernetes cluster.
Enhancement Notes
- The admission controller now injects variables and volume mounts to init containers in addition to regular containers.
- Chunk orchestrator payloads by size and weight
- KSM Core check: Add the
helm_chart
tag automatically from the standard helm labelhelm.sh/chart
. - Helm check: Add a
helm_chart
tag, equivalent to the standard helm labelhelm.sh/chart
(see https://helm.sh/docs/chart_best_practices/labels/).
Bug Fixes
- Fixed an edge case in the Admission Controller when
mutateUnlabelled
is enabled andconfigMode
is set tosocket
. This combination could prevent the creation of new DaemonSet Agent pods. - Fixed a resource leak in the helm check.