github aws/aws-node-termination-handler v1.18.0

latest releases: v1.21.0, dev-ecr-login-1, v1.20.0...
17 months ago

Improved logging in Queue Processor mode

v1.18.0 introduces the logFormatVersion Helm chart option, to allow you to opt-in to more detailed logs.

The default value is 1, which keeps logging the same way it did in prior releases (<= v1.17.3).

Setting the value to 2 will give you more detail about which AWS event triggered the cordon/drain. Previously, all these events were bucketed under SQS_TERMINATE and it was difficult to tell what was happening.

This option is also available as a command line flag, --log-format-version

What does the new logging look like?

logFormatVersion=2 modifies several Debug, Info, and Warn logs, as well as Kubernetes events emitted by NTH. These changes improve your observability about what NTH is doing when responding to events via SQS. If your monitoring system is configured to look for any of the specific strings in the tables below, you may need to modify your configuration to use the updated strings if you use the new log format version.

Changes to logs when starting up

  1. Remove event_type field from the Info log when starting a monitor; replace with monitor_type field, with new values. See Table 1.
  2. Remove event_type field from the Warn log when a monitor fails to start; replace with monitor_type field, with new values. See Table 1.

Changes to logs when processing an event

  1. New monitor field in the Info log. See Table 1.
  2. Potentially change value of kind field in the Info log, if running Queue Processor mode. See Table 2.
  3. Potentially change the "reason" field in the k8s event if running Queue Processor mode. See Table 3.

Changes to logs when receiving an SQS message

  1. Include the specific event type instead of SQS_TERMINATE in the Debug log if running Queue Processor mode. See Table 2.

Tables of changed values

Table 1: Monitor types
Previous New
REBALANCE_RECOMMENDATION REBALANCE_RECOMMENDATION_MONITOR
SCHEDULED_EVENT SCHEDULED_EVENT_MONITOR
SPOT_ITN SPOT_ITN_MONITOR
SQS_TERMINATE SQS_MONITOR
Table 2: Event types
Previous New
REBALANCE_RECOMMENDATION REBALANCE_RECOMMENDATION
SCHEDULED_EVENT SCHEDULED_EVENT
SPOT_ITN SPOT_ITN
SQS_TERMINATE REBALANCE_RECOMMENDATION SCHEDULED_EVENT SPOT_ITN STATE_CHANGE ASG_LIFECYCLE
Table 3: Event reasons
Previous reason New reason
RebalanceRecommendation RebalanceRecommendation
ScheduledEvent ScheduledEvent
SpotInterruption SpotInterruption
SQSTermination RebalanceRecommendation ScheduledEvent SpotInterruption StateChange ASGLifecycle

Commits with these changes

  • feat: emit pod events on drain by @trutx in #703
  • chore: add annotations to events in SQS mode by @trutx in #715
  • fix: show actual event kinds in Queue mode by @trutx and @cjerad in #725

Other changes

  • README: Clarify distinctions between IMDS and QP modes by @snay2 in #695
  • Clarify wording about using ASG tags. Fix broken docs link. by @snay2 in #721
  • Remove bespoke Prometheus helm chart and use the latest public release instead by @snay2 in #723
  • upgrade to Go 1.19 by @cjerad and @snay2 in #726

Full Changelog: v1.17.3...v1.18.0

Don't miss a new aws-node-termination-handler release

NewReleases is sending notifications on new releases.