windkh/node-red-contrib-telegrambot V17.4.2 on GitHub

V17.4.2 — let the backoff escalate for sustained network problems

Patch release on top of V17.4.1 addressing the V17.4.0 retest of #442 by @petermeter69.

The problem

V17.4.0's auto-restart was firing correctly (the SLIGHTLYBETTEREFATAL traces in the issue are our patched FatalError flowing through scheduleRestart), but for sustained network outages it was oscillating at the minimum 3-second cadence instead of escalating through the exponential backoff curve I'd designed. From the user's perspective: the bot kept failing every few seconds, recovery felt unstable.

The cause

The success path of scheduleRestart was zeroing restartCount the instant getTelegramBot() returned a non-null bot:

if (bot) {
    self.restartCount = 0;          // <-- immediate reset
    self.status = 'connected';
    ...
}

But the rebuilt bot hadn't been verified to actually work yet. For a persistent connectivity problem with errors arriving every ~5 s:

T+0   error 1 → schedule restart, count=1, delay 3 s
T+3   restart fires → create succeeds → count = 0     ← reset too eagerly
T+5   error 2 → schedule restart, count=1, delay 3 s  ← back to minimum
T+8   restart fires → create succeeds → count = 0
T+10  error 3 → ...

The exponential curve (3 s → 6 s → 12 s → ...) never got the chance to do its job.

The fix

scheduleRestart's success path now sets a 60-second restartStableTimer rather than resetting restartCount immediately. Three outcomes:

Stable window completes (60 s with no fresh errors): restartCount resets to 0. Next blip starts the curve over.
New error before the timer fires: scheduleRestart clears the stable timer and treats the new error as a continuation. restartCount keeps climbing through 6 s → 12 s → 24 s → 48 s → 60 s (capped).
Node closed: the timer is cleared along with the other pending timers in the close handler.

Net effect for sustained outages like petermeter69's: the bot now spaces attempts apart instead of hammering at minimum cadence — gives the network time to actually recover between tries.

Net effect for transient one-off blips: unchanged. Quick recovery, stable window completes, counter resets, ready for the next blip.

The 8-attempts-then-surrender ceiling is unchanged (~4.6 minutes of trying total in the worst case before logging "gave up").

Test coverage

Three new mocha cases in test/nodes/bot-node-restart.test.js:

Error inside the stable window increments count instead of resetting.
Error after the stable window has fired starts from count=0.
Close handler clears the pending stable timer.

208 tests pass.

What to look at if errors persist

If you keep seeing SLIGHTLYBETTEREFATAL: AggregateError [ETIMEDOUT] on V17.4.2, the underlying TCP connection to Telegram's API is failing — that's below the plugin layer. From the comment on #411:

Try setting Address family on the bot config to 4 — the AggregateError shape is dual-stack fingerprint, and forcing IPv4 eliminates it when IPv6 is the broken half.
dig +short api.telegram.org to confirm DNS is working.
mtr -T -P 443 149.154.166.110 from the affected host during an incident.

Also: the gave up restarting after fatal log line is now actually reachable for the first time (it was effectively unreachable on V17.4.0 / V17.4.1 because the counter kept resetting). If you see that line in your log, the bot has surrendered after 8 backoff attempts — you'll need to investigate the underlying connectivity.