github mickem/nscp 0.12.5

pre-release4 hours ago

Windows PDH overhaul, expression functions, boot.ini paths

This release lands a long-overdue stabilisation pass on the Windows PDH subsystem (multiple long-standing crashes and
counter-availability issues), adds first-class functions in detail-syntax / warn / crit expressions, and moves
path-resolver overrides from settings into boot.ini to unblock future moves of config and certificate storage.

Highlights

  • Windows PDH subsystem overhaul. Fixes #547 / #592 (service crash when PDH misbehaves on a particular machine),
    #634 (counters now retried when initially unavailable instead of staying broken until restart), and #652 / #906 (
    better English-counter fallback on non-English Windows).
  • Functions in expressions and templates (#281). format_bytes, convert_bytes, scale, composable with and/
    or/not — usable in detail-syntax, top-syntax, warn, crit, and filter. Today exposed by check_pdh;
    rolling out elsewhere.
  • check_network understands NIC teams (#625). New mode=adapter / mode=both reads
    Win32_PerfRawData_Tcpip_NetworkAdapter, which is the only source that reports the team aggregate.
  • Aliases in CheckHelpers. A new alias section under [/settings/check helpers/alias] provides the historical
    CheckExternalScripts alias mechanism without dragging in the external-scripts machinery. Preferred place for new
    aliases.
  • WEB: disable admin user option. Suppresses the built-in admin user entirely — for monitoring-only exposures
    where remote reconfiguration must be impossible even if credentials leak.
  • Plugin prepare-shutdown hook. Modules get a clean teardown phase before unload — listening sockets and pollers
    stop accepting work cleanly. Wired up in the network/scheduler modules.
  • Path overrides moved from settings to boot.ini. [/paths] in nsclient.ini is no longer consulted; a new
    [paths] section in boot.ini (and a --path KEY=VALUE CLI flag) take its place. This is a breaking change for
    the small number of users who relied on [/paths] — see Upgrade notes below.

Detailed changes

Windows PDH — stability overhaul

Long-standing instability in the PDH-based Windows performance-counter subsystem, addressed in one pass:

  • #547 / #592 — service crash when PDH misbehaves. Hardened the enumeration and lookup paths against the partial /
    inconsistent results PDH returns on certain machine states. PDH enumeration buffers were refactored to use smart
    buffers throughout, removing the manual sizing loops where the bug lived.
  • #634 — counters now retried when initially unavailable. Previously a counter that wasn't ready at boot would stay
    broken until the service was restarted; the collector now re-attempts on the normal collection cadence.
  • #652 / #906 — non-English Windows counter lookup. Improved the English-counter fallback path so checks that
    reference counters by English name keep working on localised installs.
  • Resource leak in PDH counter lookup — handle leaked on the error path of counter-name → counter-path resolution.

CheckSystem — expression functions and counter scaling

#281. The expression language now supports function calls, usable in any context that takes an expression (filter,
warn, crit) or a template (detail-syntax, top-syntax, perf-syntax). Use the %(...) placeholder form —
the legacy ${...} form cannot capture nested parentheses and cannot call functions.

Built-ins exposed by check_pdh today:

Function Purpose
format_bytes(value) Auto-scaled human bytes — 4194304 → "4MB" (1024-based)
format_bytes(value, 'MB') Fixed unit. B, K/KB, M/MB, G/GB, T/TB
convert_bytes(value, 'MB') Numeric value in the named unit — for thresholds
scale(value, divisor) Divide by an arbitrary divisor (e.g. 1 000 000 for Mbps)
# Threshold in MB, display human-friendly
check_pdh counter=memory_bytes \
  "warning=convert_bytes(value, 'MB') > 500" \
  "detail-syntax=%(alias) = %(format_bytes(value))"

# Network rates as Mbps (decimal — use scale, not convert_bytes)
check_pdh counter=bytes_per_sec \
  "detail-syntax=Speed = %(scale(value, 1000000)) Mbps"

check_pdh also exposes variable-style shortcuts (value_human, value_mb, value_gb, …) — syntactic sugar for the
corresponding format_bytes / convert_bytes calls. Reach for variables when one of the prebuilt units fits; reach for
functions when you need a custom unit, a custom divisor, or composition with other expressions.

CheckSystem — check_network NIC team support

#625. The default mode=interface reads Win32_PerfRawData_Tcpip_NetworkInterface (one row per physical adapter —
does not report team aggregates). New modes:

  • mode=adapter — reads Win32_PerfRawData_Tcpip_NetworkAdapter, which includes the team aggregate as a virtual
    interface named after the team. The team aggregate is the row with no matching Win32_NetworkAdapter MAC entry, so it
    can be selected with filter=MAC = ''.
  • mode=both — returns both sources, tagged with a new source keyword for filtering.
# Monitor a NIC team aggregate
check_network mode=adapter "warn=total > 100M" "crit=total > 500M"

# Alert only on the team adapter
check_network mode=adapter "filter=MAC = ''"

CheckHelpers — aliases

Aliases (a fixed command + fixed argument list exposed under a new name) have historically lived in
[/settings/external scripts/alias], requiring CheckExternalScripts to be loaded even when the alias only
invoked internal commands. A new section under [/settings/check helpers/alias] provides the same mechanism in
CheckHelpers, with no external-scripts dependency.

[/modules]
CheckHelpers = enabled

[/settings/check helpers/alias]
my_check_cpu = check_cpu warn=load>80 crit=load>90
my_check_process = check_process "process=$ARG1$" "crit=state != 'started'"

Both modules can coexist; each reads its own section. Last-loaded wins on name collisions — pick one as the home for
new aliases so you don't have to remember which is which.

WEBServer — disable admin user (cccc14e4)

New boolean under [/settings/WEB/server] that suppresses the built-in admin user entirely: it is not seeded on first
boot, any pre-existing admin row in [/settings/WEB/server/users] is dropped at load time, and the "no users → re-add
admin" fallback is skipped. For monitoring-only WEB exposures where remote reconfiguration must be impossible even if
credentials leak.

[/settings/WEB/server]
disable admin user = true

[/settings/WEB/server/users/readonly]
password = ...
role = monitoring

Mirrored on the install command:

nscp web install --disable-admin

Mutually exclusive with --password (the install would create no user, so a password would have nowhere to go — the
command refuses explicitly).

Service — prepare_shutdown plugin hook

Plugins now receive a prepare_shutdown callback before unload, giving them a chance to flush state, stop accepting
new work, and tear down listening sockets cleanly rather than racing the unload. Wired up in NRPEServer, NSCAServer,
NSClientServer, CheckMKServer, WEBServer, and Scheduler. The callback is optional — custom plugins built against
the older API continue to work unchanged.

Service — path overrides via boot.ini and --path CLI (fbdfe257, d2075b99)

Path-resolver tokens (module-path, certificate-path, log-path, cache-path, scripts, web-path, …) used to be
overridden via [/paths] in nsclient.ini. That doesn't work for the upcoming move of writable state out of the
install directory: the path resolver is needed before the main INI is opened, so overriding where the INI lives must
happen earlier.

The override location is now boot.ini:

; boot.ini
[settings]
common = ini://${shared-path}/nsclient.ini

[paths]
module-path = C:\Program Files\NSClient++\modules
log-path = D:\nscp\logs
cache-path = D:\nscp\cache

A --path KEY=VALUE CLI flag layers on top of boot.ini and wins — useful for build tooling and CI:

nscp service --run \
  --path module-path=C:\build\modules \
  --path log-path=C:\build\log

IcingaClient — built-in alias and container test

Adds a built-in alias for the standard Icinga submission flow and a Docker-based end-to-end test so the integration is
exercised on every build.

simpleini — NUL-termination fix for non-UTF-8 INI files

The INI loader passed an explicit length to mbstowcs, but per POSIX mbstowcs(NULL, src, n) ignores n and scans
until \0. On non-UTF-8 stores the size probe could walk past the buffer. The buffer now carries an explicit
terminator.

Upgrade notes

  • [/paths] users: if you had a [/paths] section in your nsclient.ini, copy the entries into [paths] in
    boot.ini. The settings-side section is no longer consulted. The default install does not use [/paths] and is
    unaffected.
  • Custom-plugin authors: the new prepare_shutdown callback is optional. If your module manages sockets or
    background threads, you should implement it — unload is now expected to be a last-resort teardown rather than the
    place where listeners get stopped.
  • check_pdh configs using ${...} for function calls: there are none today (the feature is new), but if you adapt
    examples from third-party docs that use ${format_bytes(...)}, rewrite to %(format_bytes(...)). The ${...} form
    stops at the first } and cannot parse nested parentheses.
  • Monitoring-only WEB deployments: flip disable admin user = true under [/settings/WEB/server] and define your
    own read-only users (or rely on allow anonymous access = true with a tightly scoped anonymous role). The built-in
    admin will not be seeded, even on first boot.

Full Changelog: 0.12.4...0.12.5

Don't miss a new nscp release

NewReleases is sending notifications on new releases.