Skip to content

fix(pool)!: settle in-flight task promises on every unexpected worker exit#3211

Open
jerome-benoit wants to merge 120 commits into
poolifier:masterfrom
jerome-benoit:fix/reject-in-flight-promises-on-worker-crash
Open

fix(pool)!: settle in-flight task promises on every unexpected worker exit#3211
jerome-benoit wants to merge 120 commits into
poolifier:masterfrom
jerome-benoit:fix/reject-in-flight-promises-on-worker-crash

Conversation

@jerome-benoit

@jerome-benoit jerome-benoit commented May 13, 2026

Copy link
Copy Markdown
Contributor

Summary

pool.destroy() and unexpected worker exits left in-flight task promises in promiseResponseMap permanently unsettled. On Node.js 24+ this manifests as exit code 13 (ERR_UNSETTLED_TOP_LEVEL_AWAIT) when the event loop drains.

This PR settles every in-flight and queued task promise on every voluntary termination and unexpected worker exit, across thread and cluster pools.

Breaking changes

  1. pool.execute() callers without an attached .catch now observe rejections on:
    • pool.destroy() reaching tasksFinishedTimeout while tasks are in-flight → WorkerTerminationError.
    • any unexpected worker exit (uncaught exception, signal kill, OOM, process.exit(N)) → WorkerCrashError.
  2. PoolEvents.error payload on the crash path is exclusively a typed WorkerCrashError (was: raw Error). Discriminate via error.name; the original throw is exposed via error.cause.
  3. pool.destroy() is idempotent — concurrent calls share the same in-flight promise instead of throwing.
  4. ExitHandler<Worker> widened from (exitCode: number) => void to (exitCode: number | null, signal?: NodeJS.Signals | null) => void to surface cluster 'exit' semantics (external signal kills, OOM). Under strictFunctionTypes, user handlers explicitly typed (code: number) => void no longer assign — widen the parameter to number | null and optionally accept signal.
pool.execute().catch(error => {
  // error.name === 'WorkerCrashError' | 'WorkerTerminationError'
})

Users who already await pool.execute() in a try/catch need no change.

New public API (src/index.ts)

  • WorkerCrashError{ cause, exitCode, signal, taskId, workerId }
  • WorkerTerminationError{ cause, taskId, workerId }

Discriminate via error.name === '…' (dual-package CJS/ESM safe). instanceof works only within a single bundle realm.

Implementation

Crash detection (every worker type)

  • WorkerInfo.terminating set synchronously before voluntary teardown so the once-'exit' handler skips crash detection.
  • Once-'exit' handler captures (exitCode, signal) and gates crash detection on !terminating && !destroying && !crashHandled && workerNodeKey !== -1 && abnormalExit. Both thread and cluster crashes detected uniformly.
  • Replenishment predicate (exitCode === 0 || restartWorkerOnError === true) — clean exits replenish even with restartWorkerOnError: false.
  • WorkerInfo.crashHandled write-once flag short-circuits the symmetric 'error'/'exit' re-entry race.

In-flight settlement

  • rejectInFlightTaskPromisesByRef(workerNode, workerId, errorFactory) — single helper, captures stable references before terminate() (use-after-remove safety), per-task error construction, returns the first rejection so callers route the emit.
  • try/finally around wait + reject + kill + terminate — cleanup runs even on synchronous helper throws (helper throws are surfaced via PoolEvents.error).

Event emission

  • Exactly one PoolEvents.error per worker crash, payload WorkerCrashError.
  • One PoolEvents.error per voluntary destruction with at least one in-flight task, payload WorkerTerminationError.
  • Idle voluntary teardown emits no event.
  • safeEmitPoolError skips emission when no listener is registered (Node throws on 'error' with zero listeners) and swallows listener throws so cleanup cannot be aborted.

Test coverage

22 test files, 323 passed, 5 skipped, 0 failed. Regression suite at tests/pools/crash-recovery.test.mjs:

ID Scenario
T1a / T1b Cluster SIGKILL (POSIX) / worker.kill() (Windows)
T2 Thread process.exit(N) mid-task
T3 Post-online crash (thread + cluster)
T3b Pre-ready synchronous crash (it.skip — F4 race-window)
T4 Thread uncaught throw mid-task
T5 / T5b pool.destroy() with hung in-flight task / idle worker (no spurious cross-worker rejection)
T6 In-flight crash test using new typed errors (thread + cluster)
T7 Fire-and-forget × N + destroy
T8 / T8b Idle eviction (no event) / destroyWorkerNode with in-flight
T9 / T9b crashHandled write-once + direct mutation-killer for HOLE #2
T10 Crashed worker not chosen by strategy
T11 / T11b Crash-during-destroy + direct mutation-killer for HOLE #1
T12 Concurrent pool.destroy() calls
T13 enableTasksQueue: false crash
T-I5 (a)+(b) Replenishment predicate branches

Tests use { retry: 0, timeout: 10_000 } and .catch-collection (no process.on('unhandledRejection')).

Notes

Out of crash-recovery scope: tests/pools/{thread,cluster}/fixed.test.mjs per-worker bounds (executed, stolen, stolenTasks) widened to ranges. Work-stealing-on-idle redistributes tasks non-deterministically across nodes; the total executedTasks assertion remains exact.

Validation

  • pnpm format — canonical
  • pnpm lint --max-warnings=0 — exit 0
  • pnpm build — exit 0
  • pnpm test — 22/22 files, 323 passed, 5 skipped, 0 failed
  • npx commitlint --from=upstream/master — 0 errors

Copilot AI review requested due to automatic review settings May 13, 2026 23:36
@jerome-benoit jerome-benoit requested a review from pioardi as a code owner May 13, 2026 23:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds rejection of pending task promises whose target worker has crashed, so that callers of pool.execute() no longer hang indefinitely (which on Node.js 24+ surfaces as ERR_UNSETTLED_TOP_LEVEL_AWAIT / exit code 13). The change extends the existing error once-handler in createAndSetupWorkerNode and introduces a new private helper that walks promiseResponseMap and rejects entries targeting the crashed worker node.

Changes:

  • Compute the crashed worker node key once in the error handler and reuse it for redistribution.
  • Add rejectInFlightPromises(workerNodeKey, error) to reject and clean up matching entries in promiseResponseMap.

Comment thread src/pools/abstract-pool.ts Outdated
Comment thread src/pools/abstract-pool.ts Outdated
Comment thread src/pools/abstract-pool.ts Outdated
When a worker crashes (error event), tasks already dispatched to it had
their promises orphaned in promiseResponseMap — never resolved or rejected.

This caused pool.execute() callers to hang indefinitely. On Node.js 24+,
this manifests as exit code 13 (unsettled top-level await) when the event
loop drains after all workers eventually exit.

The fix iterates promiseResponseMap on worker error and rejects all
promises targeting the crashed worker node before termination.
@jerome-benoit jerome-benoit force-pushed the fix/reject-in-flight-promises-on-worker-crash branch from 23e2df6 to 21e0835 Compare May 13, 2026 23:49
…rtWorkerOnError in exit handler

- Update promiseResponseMap workerNodeKey when tasks are stolen or
  redistributed so rejectInFlightTaskPromises targets the correct worker.
- Remove readonly on PromiseResponseWrapper.workerNodeKey to allow mutation.
- Guard exit handler worker restart with restartWorkerOnError === true
  for consistency with error handler.
- Emit taskFinished and check busyEnd in rejectInFlightTaskPromises to
  unblock waitWorkerNodeEvents listeners.
- Add regression test for in-flight task promise rejection on worker crash.
Copilot AI review requested due to automatic review settings May 14, 2026 13:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

Comment thread src/pools/abstract-pool.ts
Comment thread src/pools/abstract-pool.ts Outdated
Comment thread src/pools/abstract-pool.ts Outdated
Comment thread src/pools/abstract-pool.ts Outdated
Comment thread src/pools/abstract-pool.ts Outdated
…tale index

handleTaskExecutionResponse now resolves the worker node key from
message.workerId via getWorkerNodeKeyByWorkerId() instead of using the
stored promiseResponse.workerNodeKey which becomes stale after
removeWorkerNode splices the array.

Also update task queue tests to use tight mathematical bounds for
per-worker executed task count under work-stealing:
- lower bound: tasksQueueOptions.concurrency (in-flight tasks cannot
  be stolen)
- upper bound: N*(M-C)+C (one worker steals all stealable tasks)
… handling

- Extract handleWorkerNodeCrash to unify crash recovery logic between
  error handler (threads) and exit handler (cluster workers).
- Detect cluster worker crashes via non-zero exit code in exit handler
  since cluster workers do not emit 'error' on uncaught exceptions.
- Use WorkerTypes.cluster enum instead of string literal.
- Add cluster crash regression test with crashWorker.cjs.
- Refactor crash pool lifecycle to beforeAll/afterAll pattern.
- Use emitter.once() for test error listeners to avoid listener leaks.
Copilot AI review requested due to automatic review settings May 14, 2026 16:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (3)

src/pools/abstract-pool.ts:1166

  • Adding this.opts.restartWorkerOnError === true to the exit handler's startMinimumNumberOfWorkers gate changes pre-existing behavior for all worker exits, not just crashes. Previously, a worker that exited for any reason (e.g., a fixed worker that exited unexpectedly while restartWorkerOnError was disabled, signal-based termination, etc.) would still trigger restoration of the minimum number of workers. With this change, when users set restartWorkerOnError: false, the pool will permanently shrink below minimumNumberOfWorkers on any worker exit, which can leave a fixed pool unable to process further tasks. Consider whether this gating should only apply to the crash-detected exit path (cluster non-zero exit code) rather than to all exits.
      if (
        this.started &&
        !this.startingMinimumNumberOfWorkers &&
        !this.destroying &&
        this.opts.restartWorkerOnError === true
      ) {
        this.startMinimumNumberOfWorkers(true)
      }

src/pools/abstract-pool.ts:2304

  • rejectInFlightTaskPromises deliberately skips tasks whose taskId is still present in the crashed worker's tasksQueue, relying on the subsequent call to redistributeQueuedTasks (in handleWorkerNodeCrash) to re-home those tasks. However, redistributeQueuedTasks aborts (break) as soon as it cannot find a destination worker (e.g., single-worker pool, or no eligible destination). In that case the skipped queued-task promises are never rejected and never re-dispatched — they will hang indefinitely, which is exactly the bug this PR is trying to eliminate for in-flight tasks. Additionally, when enableTasksQueue is true but restartWorkerOnError is false and there is no other worker to receive the tasks, the same leak occurs. Consider rejecting any queued tasks that could not be redistributed (e.g., reject the leftover entries after redistributeQueuedTasks returns).
    const queuedTaskIds = new Set<string>()
    const workerNode = this.workerNodes[workerNodeKey]
    for (const task of workerNode.tasksQueue) {
      // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
      queuedTaskIds.add(task.taskId!)
    }
    for (const [taskId, promiseResponse] of this.promiseResponseMap) {
      if (
        promiseResponse.workerNodeKey === workerNodeKey &&
        !queuedTaskIds.has(taskId)
      ) {
        const crashError = new Error(
          `Worker node crashed with error: '${error.message}'`
        )
        promiseResponse.asyncResource != null
          ? promiseResponse.asyncResource.runInAsyncScope(
            promiseResponse.reject,
            this.emitter,
            crashError
          )
          : promiseResponse.reject(crashError)
        promiseResponse.asyncResource?.emitDestroy()
        this.promiseResponseMap.delete(taskId)
        workerNode.emit('taskFinished', taskId)
      }
    }
    this.checkAndEmitTaskExecutionFinishedEvents()

src/pools/abstract-pool.ts:1893

  • handleTaskExecutionResponse now derives workerNodeKey from message.workerId rather than the stored promiseResponse.workerNodeKey. While this is more robust against stale keys, it changes semantics when getWorkerNodeKeyByWorkerId returns -1 (e.g., the worker has already been removed by the exit handler before the response message is processed): in that case afterTaskExecutionHook is silently skipped, and taskFinished/idle events are not emitted on any worker node. Previously the stored key would have been used. Please confirm that this skipped bookkeeping (task usage counters, queue draining) is intentional, or fall back to the stored workerNodeKey when the lookup fails.
      const { asyncResource, reject, resolve } = promiseResponse
      const workerNodeKey = this.getWorkerNodeKeyByWorkerId(workerId)
      const workerNode =
        workerNodeKey !== -1 ? this.workerNodes[workerNodeKey] : undefined
      if (workerError != null) {
        this.emitter?.emit(PoolEvents.taskError, workerError)
        // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
        const error = this.handleWorkerError(taskId!, workerError)
        asyncResource != null
          ? asyncResource.runInAsyncScope(reject, this.emitter, error)
          : reject(error)
      } else {
        asyncResource != null
          ? asyncResource.runInAsyncScope(resolve, this.emitter, data)
          : resolve(data as Response)
      }
      asyncResource?.emitDestroy()
      if (workerNodeKey !== -1) {
        this.afterTaskExecutionHook(workerNodeKey, message)
      }
      queueMicrotask(() => {
        workerNode?.emit('taskFinished', taskId)
        // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
        this.promiseResponseMap.delete(taskId!)
        this.checkAndEmitTaskExecutionFinishedEvents()
        if (
          workerNodeKey !== -1 &&
          this.opts.enableTasksQueue === true &&
          !this.destroying
        ) {
          if (
            !this.isWorkerNodeBusy(workerNodeKey) &&
            this.tasksQueueSize(workerNodeKey) > 0
          ) {
            // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
            this.executeTask(workerNodeKey, this.dequeueTask(workerNodeKey)!)
          }
          if (this.isWorkerNodeIdle(workerNodeKey)) {
            workerNode?.emit('idle', {
              workerNodeKey,
            })
          }

Comment thread src/pools/abstract-pool.ts Outdated
Comment thread tests/pools/cluster/fixed.test.mjs Outdated
…utable array index

workerNodeKey stored in promiseResponseMap drifts after removeWorkerNode
splices the array. Sequential crashes at different indices caused in-flight
promises to be missed. Store workerId (workerNode.info.id) as the stable
identifier and match against it in rejectInFlightTaskPromises.

Also remove dead startMinimumNumberOfWorkers call in handleWorkerNodeCrash
for static workers (no-op since crashed node is still counted).
…andling

- Hoist crashError allocation out of loop (avoid repeated stack capture)
- Add rejectRemainingQueuedTaskPromises to handle tasks left in queue
  when redistributeQueuedTasks short-circuits (pool size 1, no eligible
  destination, cannotStealTask)
- Replace flaky sleep(250) in crash tests with deterministic
  waitWorkerEvents(pool, 'exit', 1)
Copilot AI review requested due to automatic review settings May 14, 2026 16:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/pools/abstract-pool.ts:1167

  • In the cluster exit handler path, handleWorkerNodeCrash is invoked before this.removeWorkerNode(workerNode). handleWorkerNodeCrash calls redistributeQueuedTasks, which scans this.workerNodes for a destination — the crashed worker is excluded only via workerNode.info.ready (set to false at the top of handleWorkerNodeCrash). However, if restartWorkerOnError === true and workerNode.info.dynamic, a new dynamic worker is created inside handleWorkerNodeCrash before the crashed node is removed, leaving the pool transiently above its target size. This is likely benign but is a meaningful change from the previous flow (where the error handler called terminate() after restart/redistribute but before any exit-driven removal). Worth verifying behavior under the dynamic-pool + cluster crash scenario, which is not covered by the new tests (the new tests use a FixedClusterPool/FixedThreadPool with restartWorkerOnError: false).
    workerNode.registerOnceWorkerEventHandler('exit', (exitCode: number) => {
      // Cluster workers do not emit 'error' on uncaught exceptions;
      // detect crashes via non-zero exit code. Intentional signal-based kills
      // produce a null code at runtime, skipping this.
      const workerNodeKey = this.workerNodes.indexOf(workerNode)
      if (
        workerNode.info.type === WorkerTypes.cluster &&
        workerNode.info.ready &&
        workerNodeKey !== -1 &&
        !this.destroying &&
        // eslint-disable-next-line @typescript-eslint/no-unnecessary-condition
        exitCode != null &&
        exitCode !== 0
      ) {
        this.handleWorkerNodeCrash(
          workerNode,
          new Error(`Worker node exited with code ${exitCode.toString()}`)
        )
      }
      this.removeWorkerNode(workerNode)
      if (
        this.started &&
        !this.startingMinimumNumberOfWorkers &&
        !this.destroying &&
        this.opts.restartWorkerOnError === true
      ) {
        this.startMinimumNumberOfWorkers(true)
      }
    })

Comment thread src/pools/abstract-pool.ts
Comment thread src/pools/abstract-pool.ts Outdated
Comment thread src/pools/abstract-pool.ts Outdated
Comment thread tests/pools/thread/fixed.test.mjs
Comment thread src/pools/abstract-pool.ts Outdated
- Add rejectTaskPromiseResponse() to encapsulate reject + emitDestroy sequence
- Add runInAsyncScope() to centralize async resource scope execution
- Add JSDoc to rejectRemainingQueuedTaskPromises for consistency
Copilot AI review requested due to automatic review settings May 14, 2026 17:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

src/pools/abstract-pool.ts:1166

  • This change introduces a behavior change beyond fixing the in-flight promise issue: previously the exit handler called startMinimumNumberOfWorkers(true) unconditionally on every worker exit, so a fixed pool whose worker process exited (for any reason) would always be replenished. Now replenishment only occurs when restartWorkerOnError === true. Users who set restartWorkerOnError: false while still expecting the pool to remain populated after non-error exits will see their pool shrink permanently. If this is intentional, it should be called out in the PR description / changelog; if not, consider gating only the crash-driven restart path and keep unconditional replenishment for clean exits.
        this.started &&
        !this.startingMinimumNumberOfWorkers &&
        !this.destroying &&
        this.opts.restartWorkerOnError === true
      ) {
        this.startMinimumNumberOfWorkers(true)
      }

src/pools/abstract-pool.ts:2290

  • promiseResponse.workerId is typed as number | undefined, and info.id may itself be undefined for a worker that has not yet been assigned an id. If handleTask were ever invoked for such a worker, the stored workerId would be undefined; then on a worker crash, crashedWorkerId = workerNode.info.id would also be undefined and the equality check promiseResponse.workerId === crashedWorkerId would match every promise whose id was never set, potentially rejecting unrelated in-flight tasks. Consider guarding rejectInFlightTaskPromises against crashedWorkerId == null (early return) and/or refusing to store an entry in promiseResponseMap without a concrete workerId.
    const crashedWorkerId = workerNode.info.id
    const crashError = new Error(
      `Worker node crashed with error: '${error.message}'`
    )
    for (const [taskId, promiseResponse] of this.promiseResponseMap) {
      if (
        promiseResponse.workerId === crashedWorkerId &&
        !queuedTaskIds.has(taskId)
      ) {
        this.rejectTaskPromiseResponse(promiseResponse, crashError)
        this.promiseResponseMap.delete(taskId)
        workerNode.emit('taskFinished', taskId)
      }
    }

Comment thread tests/pools/thread/fixed.test.mjs
Comment thread tests/pools/cluster/fixed.test.mjs
Comment thread src/pools/abstract-pool.ts Outdated
@jerome-benoit jerome-benoit changed the title fix: reject in-flight task promises when worker crashes fix: properly settle task promises and detect crashes for all worker types May 14, 2026
…ling

- Add crashHandled flag to WorkerInfo to prevent double crash handling
- Extend exit handler crash detection to thread workers (not just cluster)
- Add defensive guard to updatePromiseResponseWorkerId for invalid indices
- Decrement executing/increment failed counters on crash rejection
- Tighten stolen task upper bounds in test assertions
Strip narrative drift on five protected methods on AbstractPool:
buildWorkerCrashError, destroyWorkerNode, handleWorkerNodeCrash,
hasInFlightTaskForWorkerId, rejectInFlightTaskPromisesByRef,
safeEmitPoolError. Specifically:

- Drop redundant '@internal' tag (sibling protected methods do not
  carry it; visibility already conveys intent).
- Replace numbered list in handleWorkerNodeCrash JSDoc with a single
  declarative sentence.
- Remove parenthetical qualifiers ('stable reference',
  'captured pre-await') from @param annotations.
- Drop ALL-CAPS emphasis ('AFTER').
- Collapse blank line after first JSDoc sentence to match sibling
  density.

No behavior change.

Signed-off-by: Jérôme Benoit <[email protected]>
…ghtTaskPromises

The 'ByRef' suffix has no sibling in the codebase; the established
By-suffix convention is 'ByWorkerId' (e.g. getWorkerNodeKeyByWorkerId).
Since there is no sibling 'rejectInFlightTaskPromisesById' to
disambiguate against, the qualifier adds no value.

Touches 4 source occurrences and 5 test occurrences. The method was
introduced by this in-review PR and is not yet released.

Signed-off-by: Jérôme Benoit <[email protected]>
Replace the three-line narrative comment explaining
EventEmitter listener-order semantics with a single declarative
sentence stating the invariant. The mechanism (registration order
determines firing order) is Node-EE common knowledge and need not
be restated; only the project-specific invariant (pool once-handlers
must precede user handlers) needs to be encoded inline.

Signed-off-by: Jérôme Benoit <[email protected]>
Six existing WorkerInfo flags (backPressure, backPressureStealing,
continuousStealing, queuedTaskAbortion, stealing, stolen) use the
'<Name> flag.\nThis flag is set to `true` when [condition].' pattern.
The two new flags (crashHandled, terminating) used a divergent
'Set to `true` ...' / 'Never reset.' pattern. Aligned for naming
coherence.

Signed-off-by: Jérôme Benoit <[email protected]>
All five callers of buildWorkerCrashError are internal to AbstractPool;
no subclass overrides or extends it. Sibling factory buildTasksQueueOptions
is also private. Tighten the visibility to match.

Signed-off-by: Jérôme Benoit <[email protected]>
Pure formatting helpers with no `this` dependency live in
`src/pools/utils.ts` per existing convention (16 sibling exports).
`formatExitDetail` was the only module-scope helper in
`abstract-pool.ts` — relocated for structural consistency.

Signed-off-by: Jérôme Benoit <[email protected]>
Two call sites built nearly-identical WorkerCrashError instances for
unexpected worker exits — same options shape, message differs only by
the ' during teardown' qualifier. Extracted to a single private
factory `makeUnexpectedExitError(context, exitCode, signal, workerId)`
discriminated by a 'lifecycle' | 'teardown' tag.

Signed-off-by: Jérôme Benoit <[email protected]>
WorkerCrashErrorOptions JSDoc duplicated the field-level semantics
(taskId, exitCode, signal) already documented on the WorkerCrashError
class JSDoc and in the canonical docs/api.md discrimination contract.
Trimmed to a one-line description matching WorkerTerminationErrorOptions
for symmetry. AGENTS.md: no duplication.

Signed-off-by: Jérôme Benoit <[email protected]>
Copilot AI review requested due to automatic review settings May 19, 2026 22:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.

Comment thread src/pools/abstract-pool.ts
Comment thread src/pools/worker-node.ts
Three distinct sources of warnings, each fixed at root:

1. Build TS2345 narrowing failure on `teardownCause` capture.
   The `let teardownCause` mutation by the once-handlers prevents
   TypeScript from preserving the `!= null` narrow across the
   arrow-function closure in the `errorFactory` ternary.
   Capture into a `const cause` after `captureEnabled = false` so
   narrowing propagates through the closure.

2. Typedoc broken-link warnings on three JSDoc comments.
   `{@link PoolEvents.error}` (enum member, non-navigable target)
   and `{@link AbstractPool.rejectRemainingQueuedTaskPromises}`
   (private method, excluded from docs) resolved but pointed at
   undocumented symbols. Replaced with backtick literals.

3. Typedoc warning on `*Options` interfaces referenced by public
   constructor signatures but `@internal` (intentionally unexported
   for TS4063 declaration-emit compliance). Added to typedoc's
   `intentionallyNotExported` allow-list.

Quality gates after fix:
- `pnpm lint --max-warnings=0` exit 0
- `pnpm build` exit 0, zero TS warnings
- `pnpm build:prod` exit 0
- `pnpm typedoc` exit 0, zero warnings
- `pnpm test` 352 passed / 1 skipped

Signed-off-by: Jérôme Benoit <[email protected]>
The pool's once-'exit' handler at createAndSetupWorkerNode calls
workerNode.terminate() after detecting an exit. Inside doTerminate(),
a fresh once('exit') listener was registered to wait for exit — but
the event has already fired, so waitWorkerExit never resolved and
the WORKER_TERMINATION_GRACE_MS (5000 ms) timer was hit on every
crash/destroy path, retaining the WorkerNode in memory for 5s per
worker.

Capture worker exit state in a private 'exited' flag set by an
once('exit') listener registered in the WorkerNode constructor (fires
first per EventEmitter FIFO order). Fast-path doTerminate() to
immediate cleanup when the flag is set, skipping the wait/grace race
and the redundant disconnect()/terminate()/kill() calls on a worker
that is already dead.

Regression test T11e asserts the fast-path completes in < 250 ms
(20× margin vs the 5000 ms slow-path).

Signed-off-by: Jérôme Benoit <[email protected]>
Copilot AI review requested due to automatic review settings May 20, 2026 17:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 42 out of 42 changed files in this pull request and generated 1 comment.

Comment thread src/pools/pool.ts
Comment on lines +332 to +335
* including SIGKILL/SIGSEGV/OOM-killer). Clean exits (`exitCode === 0`)
* always replenish regardless of this option. In-flight task promises
* bound to a crashed worker always reject with `WorkerCrashError`
* regardless of this option.
The 02951ad change to include CJS artifacts (`lib/**/*.{mjs,cjs}`)
was a regression: tests load `.mjs` artifacts only — the `.cjs`
mirrors are never executed and report 0% coverage by construction,
dragging the global coverage below the 80% threshold (49.4% lines)
and breaking the CI 'Node.js 22.x on ubuntu-latest' job which runs
`pnpm test:coverage`.

Restore the narrower include from 51e25ef, which scopes coverage
measurement to the artifacts the test suite actually exercises.

Local `pnpm test:coverage` after revert: 90.57% lines, 84.64%
branches, 90.71% statements, 93.26% functions.

Signed-off-by: Jérôme Benoit <[email protected]>
Copilot AI review requested due to automatic review settings May 25, 2026 18:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 42 out of 42 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 26, 2026 17:58

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 42 out of 42 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants