github DataDog/dd-agent 5.7.0

latest releases: 5.32.9, 5.32.8, 5.32.8-rc.7...
8 years ago

5.7.0 / 03-07-2016

Linux, Mac OS and Source Install only

Details

5.6.3...5.7.0

New integrations

  • Ceph
  • DNS
  • HDFS
  • MapReduce
  • StatsD
  • TCP RTT (go-metro)
  • YARN

Updated integrations

  • Apache
  • AWS
  • Cassandra
  • Consul
  • Directory
  • Docker
  • Elasticsearch
  • Go expvar
  • Gunicorn
  • HAProxy
  • HTTP
  • IIS
  • Kafka
  • Mesos
  • MongoDB
  • MySQL
  • PgBouncer
  • Postgres
  • Process
  • Redis
  • SNMP
  • SSH
  • TeamCity
  • Tomcat
  • vSphere
  • Windows Service
  • Windows Event Log
  • WMI
  • Zookeeper

Hadoop integrations (HDFS, MapReduce and YARN checks)

The Agent now includes 4 new checks to monitor Hadoop clusters:

  • 2 HDFS checks (hdfs_namenode and hdfs_datanode) that collect metrics respectively from namenodes and datanodes using the JMX-HTTP API
  • a MapReduce check that collects metrics on the running Mapreduce jobs from the Application Master's REST API
  • a YARN check that collects metrics from YARN's ResourceManager REST API

The existing hdfs check is deprecated and will be removed in a future version of the Agent. Its metric scope is entirely covered by the new hdfs_namenode check.

TCP RTT measurement with go-metro

This new feature is in beta

The Datadog Agent on 64-bit Linux is now bundled with a new component (go-metro) that passively calculates TCP RTT metrics between the agent's host and external hosts, and reports them as system.net.tcp.rtt.avg, system.net.tcp.rtt.jitter and system.net.tcp.rtt through StatsD.

go-metro follows TCP streams active within a certain period of time and estimates the RTT between any outgoing packet with data, and its corresponding TCP acknowledgement.

go-metro runs in its own process. It's disabled by default and can be enabled like a regular check by configuring an /etc/dd-agent/conf.d/go-metro.yaml file and restarting the agent.

For more details on go-metro, check out the project's GitHub page.

Ceph check

The Ceph check retrieves metrics from Ceph's Administration Tool command (ceph).

The check collects metrics from mon_status, status, df detail, osd pool stats and osd perf, and sends a service check reflecting the overall health of the cluster.

See #2264

MySQL

Multiple community-contributed additions to the MySQL check have been consolidated and merged, including:

  • metrics from the performance_schema table on MySQL >= 5.6 (thanks to @ovaistariq)
  • extra metrics on the InnoDB and MyISAM engines, from the Binlog, and from the SHOW STATUS query (thanks to @ovaistariq)
  • several schema-specific metrics, including schema size, schema average query runtime and 95th percentile query execution time (thanks again to @ovaistariq)
  • metrics on the Handler (thanks to @polynomial)
  • Galera-specific performance stats (thanks to @zdannar)
  • Query Cache metrics (thanks to @leucos)
  • a mysql.replication.slave_running service check reflecting the state of the slaves (thanks to @c960657)

Most of these additional metrics are not collected by default but can be enabled in the check's YAML file. See the YAML conf example file for details.

Various bug fixes and improvements have also been implemented:

  • the Agent's connections to MySQL are handled properly to prevent stale connections
  • the replication status is implemented on both the master and the slaves. On the master this status is determined by the Binlog status and the number of slaves.
  • the system metrics of MySQL are retrieved w/o errors on non-linux platforms by using the psutil library
  • the parsing of the MySQL server version is improved

Huge thanks to all our contributors for all these improvements!

See #2116 and #2242

Potential backward incompatibilities

Docker

The dockerized Agent now uses the docker hostname (provided by the Name param from docker info) as its own hostname when available. This means that for hosts running the dockerized Agent the reported hostname may change to this docker-provided hostname.

For reference, the rules followed by the Agent for its hostname resolution are described on this wiki page.

MongoDB

The collect_tcmalloc_metrics parameter in the YAML conf is replaced with the tcmalloc option under additional_metrics.
Please refer to the example YAML conf file for more info on the usage of the additional_metrics option.

vSphere

Instead of sending all metrics as gauges, the vSphere integration now checks the types of the metrics as reported by the VMWare module, and sends metrics as rates when applicable.

If you haven't enabled the all_metrics option on the check, the only affected metrics are cpu.usage, cpu.usagemhz, network.received and network.transmitted.
If the option is enabled, the additional affected metrics are listed here. The change will affect the values of these metrics.

WMI check

The wmi_check now only supports % as the wildcard character in the filters. The support of * as the wildcard character, which was undocumented, has been dropped.

Changes

  • [FEATURE] Ceph: New check collecting metrics from Ceph clusters. #2264
  • [FEATURE] Consul: Add SSL support. See #2034 (Thanks @diogokiss)
  • [FEATURE] DNS: New check that sends a service check reflecting the status of a hostname's resolution on a nameserver. See #2249 and #2289
  • [FEATURE] Elasticsearch: Report additional metrics related to fs, indices.segments and indices.translog. See #2143 (Thanks @bdharrington7)
  • [FEATURE] HDFS: 2 new checks (see description above). See #2235, #2260, #2274 and #2287
  • [FEATURE] Go-metro: New component that measures TCP RTT (in beta, see description above). See #2208
  • [FEATURE] Linux: Add memory metrics (slab, page tables and cached swap). See #2100 (Thanks @gphat)
  • [FEATURE] Linux: New linux_proc_extras check collecting system-wide metrics on interrupts, context switches and processes. See #2202 (Thanks @gphat)
  • [FEATURE] MapReduce: New check (see description above). See #2236
  • [FEATURE] MongoDB: Collect optional additional metrics, grouped by topic. These can be enabled with the new additional_metrics option in the check's YAML conf. Also, the underlying pymongo library has been upgraded from 2.8 to 3.2. See #2161, #2166, #2140 and #2160 (Thanks @scottbessler and @benmccann)
  • [FEATURE] MySQL: Add tag parameter for custom MySQL queries. See #2229
  • [FEATURE] MySQL: Enhance the catalog of metrics reported, and add a service check on the replication state. See #2116, #2242 and #2288 (Thanks @ovaistariq, @zdannar, @polynomial, @leucos, @Zenexer, @c960657, @nfo, @patricknelson and @scottgeary)
  • [FEATURE] Postgres: Measure user functions. See #2164
  • [FEATURE] Process: Allow configuring the path to procfs (useful when the agent is run in a container), with a newer version of psutil. See #2163 and #2134 (Thanks @sethp-jive)
  • [FEATURE] Redis: Optionally report metrics from INFO COMMANDSTATS as calls, usec and usec_per_call (prefixed with redis.command.). See #2109
  • [FEATURE] SNMP: Add support for forced SNMP data types to help w/ buggy devices. See #2165 (Thanks @chrissnell)
  • [FEATURE] SSH: Add Windows support. See #2072
  • [FEATURE] StatsD: New check collecting metrics and service checks using StatsD's admin interface. See #1978 and #2162 (Thanks @gphat)
  • [FEATURE] vSphere: Add SSL config options for certs. See #2180
  • [FEATURE] YARN: New check (see description above). See #2147 and #2207
  • [FEATURE] Zookeeper: Gather stats from mntr command and report zookeeper.instances.<mode> metrics as 0/1 gauge. See #2156 (Thanks @jpittis)
  • [IMPROVEMENT] Apache: Allow disabling ssl validation. See #2169
  • [IMPROVEMENT] AWS: Incorporate security-groups into tags collected from EC2. See #1951
  • [IMPROVEMENT] Cassandra: Add YAML conf for Cassandra version > 2.2. See #2142 and #2271
  • [IMPROVEMENT] Directory: Show check on Windows. See #2184 (Thanks @xkrt)
  • [IMPROVEMENT] Docker: Pass tags to events as well. See #2182
  • [IMPROVEMENT] Docker: Use the docker hostname as the agent's hostname when available. See #2145
  • [IMPROVEMENT] Elasticsearch: Apply custom tags to service checks too. See #2148
  • [IMPROVEMENT] Go expvar: Add configuration option for custom metric namespace. See #2022 (Thanks @theckman)
  • [IMPROVEMENT] Go expvar: Add counter support. See #2133 (Thanks @gphat)
  • [IMPROVEMENT] Gohai: Count number of logical processors. See gohai-22
  • [IMPROVEMENT] HAProxy: Add option to count statuses by service. See #2304 and #2314
  • [IMPROVEMENT] HTTP: Add a days_critical option to the SSL certificate expiration feature. See #2087
  • [IMPROVEMENT] HTTP: Support unicode in content-matching. See #2092
  • [IMPROVEMENT] Kafka: Compute instant rates and capture more metrics in example configuration. See #2079 (Thanks @dougbarth)
  • [IMPROVEMENT] Linux install script: Add custom provided hostname to datadog.conf. See #2225 (Thanks @lowl4tency)
  • [IMPROVEMENT] Mesos: Improve checks' performance by preventing requests from using chardet. See #2192 (Thanks @GregBowyer)
  • [IMPROVEMENT] MongoDB: Tag mongo instances by replset state. See #2244 (Thanks @rhwlo)
  • [IMPROVEMENT] SNMP: Improve performance by running instances of the check in parallel. See #2152
  • [IMPROVEMENT] SNMP: Make MIB constraint enforcement optional and improve resilience. See #2268
  • [IMPROVEMENT] TeamCity: Allow disabling ssl validation. See #2091 (Thanks @jslatts)
  • [IMPROVEMENT] Unix: Revamp source install script. See #2198 and #2199
  • [IMPROVEMENT] vSphere: Add network.received and network.transmitted to the basic metrics collected. See #1824
  • [IMPROVEMENT] vSphere: Check metric type to determine how to report (rate or gauge). See #2115
  • [IMPROVEMENT] Windows: Add uptime metric. See #2135, #2292 and #2299
  • [IMPROVEMENT] Windows WMI-based checks (wmi_check, System check, IIS, Windows Service, Windows Event Log): gracefully time out WMI queries. See #2185, #2228 and #2278
  • [IMPROVEMENT] Windows IIS, Service and Event Log checks: use the new WMI wrapper with increased performance. See #2136
  • [IMPROVEMENT] Windows packaging: Tighten permissions on datadog.conf. See #2210
  • [BUGFIX] AWS: Use proxy settings for EC2 tag collection. See #2201
  • [BUGFIX] AWS: During EC2 tags collection, log a warning when the instance is not associated with an IAM role. See #2285
  • [BUGFIX] Core: Do not log API keys. See #2146
  • [BUGFIX] Core: Fix cases of low/no disk space causing the Agent to crash when calling subprocesses. See #2223
  • [BUGFIX] Core: Make Dogstatsd recover gracefully from serialization errors. See #2176
  • [BUGFIX] Core: Set agent pid file and path from constants. See #2128 (Thanks @urosgruber)
  • [BUGFIX] Development: Fix test of platform in etcd CI setup script. See #2205 (Thanks @ojongerius)
  • [BUGFIX] Docker: Avoid event collection failure if an event has no ID param. See #2308
  • [BUGFIX] Docker: Catch exception when getting k8s labels fails. See #2200
  • [BUGFIX] Docker: Don't warn if process finishes before measuring. See #2114 (Thanks @oeuftete)
  • [BUGFIX] Docker: Remove misleading warning on excluded containers. See #2179 (Thanks @EdRow)
  • [BUGFIX] Documentation: Update link to dogstatsd guide in datadog.conf. See #2181
  • [BUGFIX] Elasticsearch: Optionally collect pending task stats. See #2250
  • [BUGFIX] Flare: Use ssl and proxy settings from datadog.conf. See #2234 (Thanks @tebriel)
  • [BUGFIX] Flare: Mention path to tar file in Windows UI. See #2084
  • [BUGFIX] FreeBSD: Use correct log file for syslog. See #2171
  • [BUGFIX] Go expvar: Add timeout for requests to get go expvar metrics. See #2183 (Thanks @gphat)
  • [BUGFIX] Gohai: Log unexpected OSError exceptions instead of re-raising them. See #2309
  • [BUGFIX] Gunicorn: Mention in YAML conf that the setproctitle module is required. See #2215
  • [BUGFIX] HTTP: Add an option to disable warnings when ssl validation is disabled. See #2193
  • [BUGFIX] HTTP: Improve log message when http code is incorrect. See #2203
  • [BUGFIX] HTTP: Rename ssl_expire to check_certificate_expiration in YAML comment. See #2086 (Thanks @MiguelMoll)
  • [BUGFIX] HTTP: Use proxy settings from datadog.conf. See #2112
  • [BUGFIX] Kubernetes: Remove unused function. See #2157
  • [BUGFIX] OpenStack: Improve docs in YAML conf. See #2094
  • [BUGFIX] OpenStack: Remove recommendation for omitting trailing slashes in YAML conf. See #2081
  • [BUGFIX] Mac OS X: Fix gohai call by passing correct PATH to supervisor. See #2206
  • [BUGFIX] Mesos slave: Allow configuring mesos master port. See #2189
  • [BUGFIX] MySQL: Fix buggy tagging in service_checks on instances configured w/ unix socket. See #2216
  • [BUGFIX] PgBouncer: Avoid raising error when there are no results for a query. See #2280 (Thanks @hjkatz)
  • [BUGFIX] SNMP: Fix bug when the requested oid is prefixed by another requested oid. See #2246 (Thanks @xkrt)
  • [BUGFIX] Tomcat: Fix bad attribute in YAML conf file. See #2153
  • [BUGFIX] Unix: Fix URL of get-pip script in source install script. See #2220 (Thanks @mooney6023)
  • [BUGFIX] Windows: Fix cases of collector getting wrongfully restarted by watchdog after one correct watchdog restart. See #2175
  • [BUGFIX] WMI check: Remove unnecessary warnings on Name property. See #2291
  • [BUGFIX] WMI check: Always add the tag_by parameter to the collected properties. See #2296

Don't miss a new dd-agent release

NewReleases is sending notifications on new releases.