windkh/node-red-contrib-telegrambot V17.4.8 on GitHub

V17.4.8 — polling teardown now actually halts the recursive loop

Direct response to @AtieshStaff's persistent 409 Conflict after upgrading to V17.4.5 on #440, and addresses the root mechanism behind several earlier reports (#441 chapapagit, #411 Bobo-amg).

What was actually broken

Two distinct teardown paths — abortBot (called during scheduleRestart, control-node stop, and node-red close) and restartPolling (the polling-error inner retry) — believed they were stopping the bot's polling loop. Neither actually did.

Defect 1: stopPolling({cancel:true}) — the cancel option used in abortBot since V17.3.0. Looking at the library at node_modules/node-telegram-bot-api/src/telegramPolling.js:58-74:

stop(options = {}) {
    if (!this._lastRequest) return Promise.resolve();
    const lastRequest = this._lastRequest;
    this._lastRequest = null;
    clearTimeout(this._pollingTimeout);
    if (options.cancel) {
        lastRequest.cancel(reason);
        return Promise.resolve();           // ← cancel:true path
    }
    this._abort = true;                     // ← only set in cancel:false path
    return lastRequest.finally(() => { this._abort = false; });
}

The lib's _polling() is a recursive setTimeout chain whose .finally() block schedules the next iteration unless this._abort is true. With cancel:true, _abort is never set. The cancellation closes the local socket, the request errors out, the .finally() runs, sees no abort flag, and schedules another _polling(). The old polling instance keeps running indefinitely after stopPolling resolves.

Defect 2: delete telegramBot._polling; telegramBot._polling = null; in restartPolling, added in V17.4.4 as an attempt to compensate for #1. This clears our reference to the polling instance, but the instance itself is held alive by its own .finally() closure and keeps polling. Then startPolling({restart:true}) creates a new polling instance and starts it. Two parallel pollers for the same token.

Why this caused 409 Conflict for AtieshStaff

AtieshStaff uses a SOCKS proxy in front of his Node-RED. When microsocks drops, polling fails → polling_error → restartPolling runs every 3 s. Each cycle starts a new polling instance without stopping the old one, because of defect #2. After a few cycles he has several polling instances all firing getUpdates against Telegram. When the proxy recovers and traffic flows again, Telegram sees multiple getUpdates for the same token and rejects all but one with 409 Conflict: terminated by other getUpdates request. Persistent loop. Full server restart was the only fix because that wiped the orphaned polling instances from the process.

What V17.4.8 changes

Both abortBot and restartPolling now use the pattern that actually works:

const polling = self.telegramBot._polling;
if (polling._lastRequest && typeof polling._lastRequest.cancel === 'function') {
    polling._lastRequest.cancel('abortBot');       // 1. close local socket immediately
}
self.telegramBot.stopPolling({ cancel: false })    // 2. set _abort=true, wait for loop
    .then(setStatusDisconnected, setStatusDisconnected);

Step 1 is what V17.3.0's cancel:true was trying to achieve — fast close of the in-flight request without waiting up to pollTimeout seconds for Telegram's long-poll timeout. Step 2 is what cancel:false provides — setting _abort=true so the .finally() block honours the stop and does not schedule another iteration.

After step 2 resolves, the polling instance is genuinely halted. Only then is it safe to construct a new bot / new poll cycle, which is what scheduleRestart / restartPolling go on to do.

The V17.4.4 _polling = null hack is removed — it was never stopping anything, only masking the symptom for the small subset of cases where the lib happened to error out on the second iteration.

Tests

232 passing (up from 228 in V17.4.7). 4 new mocha cases in test/nodes/bot-node-restart.test.js:

abortBot cancels in-flight _lastRequest before stopPolling resolves
abortBot still completes the done callback if stopPolling rejects (e.g. lib internal error)
Tolerates _lastRequest shapes without a cancel function (lib version drift)
Tolerates _polling without an in-flight _lastRequest (bot between polls)

What this fixes upstream of itself

The 409 Conflict patterns in:

#440 AtieshStaff — direct fix. The proxy-recovery 409 loop is gone because there's only ever one polling instance at a time.
#441 chapapagit — root cause was the same. V17.4.7's circuit breaker still ships and protects against future analogous shapes, but the underlying defect is fixed here.
#411 Bobo-amg — same family. Bobo-amg's command-node freeze symptom was downstream of a polling loop that kept itself alive against the operator's intent.

V17.4.4's polling_error-side 409 detection (409 Conflict — letting it clear naturally) remains useful for the transient legitimate case (e.g. server-side race during deploy), but the persistent multi-poller variant should no longer arise.

What V17.4.8 does NOT do

Doesn't fix the underlying network problems (proxy drops, etc.) — those remain operator-side concerns.
Doesn't change the auto-restart backoff curve or the V17.4.5 socket-pool rebuild — those still work as designed and are complementary.

Worth re-reading

The V17.3.0 / V17.4.4 commit comments in bot-node.js previously gave incorrect reasons for why their respective changes existed. Those have been corrected in this commit so the next person reading the file has accurate context.