Skip to content

stream: reduce per-chunk overhead in streams and webstreams#64312

Open
anonrig wants to merge 2 commits into
nodejs:mainfrom
anonrig:stream-perf-ring-buffer
Open

stream: reduce per-chunk overhead in streams and webstreams#64312
anonrig wants to merge 2 commits into
nodejs:mainfrom
anonrig:stream-perf-ring-buffer

Conversation

@anonrig

@anonrig anonrig commented Jul 5, 2026

Copy link
Copy Markdown
Member

Two independent, behavior-preserving performance changes to the streams
implementations, one per commit (intended to land separately via
commit-queue-rebase):

1. stream: hoist repeated loads in readable paths

fromList() re-read state.length and each chunk's length several times per
call and re-loaded buffer[idx] around every copy, and read() loaded
state.length three times in its read(0) check. The engine cannot fold these
loads across the intervening copy/slice calls, so they are cached in locals.
This mirrors what Bun's fork of the same functions carries on top of the shared
readable-stream lineage (bun/src/js/internal/streams/readable.ts).

2. stream: use ring buffer for WHATWG stream queues

The [[queue]] backing every default readable/writable controller was a plain
array of { value, size } wrappers consumed with ArrayPrototypeShift: one
wrapper allocation per buffered chunk, and element movement (or engine
re-linearization) per dequeue. The byte controller queue paid the same shift
cost for its chunk-descriptor records.

This replaces the array with a power-of-two ring buffer:

  • default controller queues store each entry as (value, size) in two
    consecutive slots — the per-chunk wrapper allocation disappears;
  • the byte controller keeps its descriptor records (mutated in place at the
    head) in single slots;
  • controllers start from (and are reset to) a shared immutable empty queue, so
    constructing a stream allocates no queue storage until a chunk is actually
    buffered;
  • enqueues measured by the internal default size algorithm (never observable
    by user code, always returns 1, cannot throw) skip the algorithm call and
    its try/catch.

The layout mirrors what Bun/WebKit use for the same spec structure
(WTF::Deque in WebCore's StreamQueue.h, the pure-JS ring buffer in Bun's
src/js/internal/fifo.ts).

Benchmark results

benchmark/compare.js against an unmodified-HEAD baseline binary on an
otherwise idle machine, Welch t-test (Rscript unavailable, so computed from
the CSV; happy to share the raw captures). Full webstreams suite at 30 runs
plus an independent 15-run repeat; streams set at 60 runs plus a 30-run
repeat. Improvements below are stable-sign across both captures.

Ring buffer (webstreams), all p < 1e-5:

benchmark improvement
webstreams/pipe-to.js (all 16 HWM configs) +12% … +17%
webstreams/readable-read-buffered.js bufferSize=1 +20%
webstreams/readable-read-buffered.js bufferSize=10/100 +36-38%
webstreams/readable-read-buffered.js bufferSize=1000 +49%
webstreams/readable-async-iterator.js +21%

Readable hoists (node:stream):

benchmark improvement
streams/readable-unevenread.js +0.65% (p=4.3e-4) / +0.59% (p=5.9e-3)
streams/pipe.js +0.74% (p=2.0e-4) / +0.52% (p=8.8e-3)

No statistically significant regression across the rest of the captured
streams/webstreams benchmarks (creation, js_transfer, readable-read,
boundaryread, bigread, uint8array, pipe-object-mode); two apparent full-suite
deltas (creation kind='ReadableStream.tee', readable-read type='normal')
disappear in isolated 60-run re-checks (p=0.50 / p=0.95) and are attributable
to thermal drift in the suite-ordered run.

Correctness

  • No public API, ordering, or error-behavior change; the queue is internal
    state behind kState.
  • test/parallel/test-stream*, test-whatwg-*stream*, test-webstream*, and
    test/wpt/test-streams.js all pass; full test/sequential passes.
  • New regression test test-webstreams-queue-wraparound.js drives ring
    growth/wrap-around, drain-rewind/shrink, fractional/zero size()
    accounting, BYOB partial in-place head consumption, and writable
    backpressure drain through the public API only (it also passes on the
    unpatched binary).

anonrig added 2 commits July 5, 2026 16:04
fromList() re-read state.length and each chunk's length several times
per call and re-loaded buffer[idx] around every copy, and read() loaded
state.length three times in its read(0) check; the engine cannot fold
these loads across the intervening copy and slice calls. Cache them in
locals instead. Ported from Bun's fork of the same functions
(src/js/internal/streams/readable.ts fromList/read), which carries
these hoists on top of the shared readable-stream lineage.

benchmark/compare.js against the unmodified baseline (60-run capture
plus an independent 30-run repeat, Welch t-test):
streams/readable-unevenread +0.65% (p=4.3e-4) / +0.59% (p=5.9e-3) and
streams/pipe.js +0.74% (p=2.0e-4) / +0.52% (p=8.8e-3), with no
significant regression across the captured streams benchmarks.

Refs: https://github.com/oven-sh/bun/blob/main/src/js/internal/streams/readable.ts
Signed-off-by: Yagiz Nizipli <yagiz@nizipli.com>
The [[queue]] backing every default readable/writable controller was a
plain array of { value, size } wrappers consumed with
ArrayPrototypeShift, so each buffered chunk allocated a wrapper object
and each dequeue moved (or forced the engine to re-linearize) the
remaining elements; the byte controller queue paid the same shift cost
for its chunk descriptor records.

Replace the array with a power-of-two ring buffer. Default controller
queues store each entry as (value, size) in two consecutive slots, so
the per-chunk wrapper allocation disappears; the byte controller keeps
its descriptor records (they are mutated in place at the head) in
single slots. Controllers start from (and are reset to) a shared
immutable empty queue, so constructing a stream allocates no queue
storage until a chunk is actually buffered. Enqueues measured by the
internal default size algorithm (never observable by user code, always
returns 1, cannot throw) skip the algorithm call and its try/catch
entirely.

The layout mirrors what Bun/WebKit use for the same spec structure:
[[queue]] as a ring-buffer deque (WTF::Deque in Bun's
src/jsc/bindings/webcore/streams/StreamQueue.h), the pure-JS ring
buffer in Bun's src/js/internal/fifo.ts, and the trivial-size-algorithm
bypass in
src/jsc/bindings/webcore/streams/JSReadableStreamDefaultController.cpp.

benchmark/compare.js against the unmodified baseline (30-run capture
plus an independent 15-run repeat, Welch t-test, all p < 1e-5):
webstreams/pipe-to.js +12-17% across all sixteen high-water-mark
configurations, readable-read-buffered +20% (bufferSize=1) to +49%
(bufferSize=1000), readable-async-iterator +21%. No stable significant
regression across the rest of the webstreams suite: the creation.js and
readable-read.js deltas seen in the full-suite capture disappear in
isolated 60-run rechecks.

Refs: https://github.com/oven-sh/bun/blob/main/src/js/internal/fifo.ts
Signed-off-by: Yagiz Nizipli <yagiz@nizipli.com>
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/streams

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. stream Issues and PRs related to the stream subsystem. web streams labels Jul 5, 2026
@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Jul 5, 2026
@github-actions github-actions Bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jul 5, 2026
@nodejs-github-bot

Copy link
Copy Markdown
Collaborator

@anonrig anonrig added the commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. label Jul 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commit-queue-rebase Add this label to allow the Commit Queue to land a PR in several commits. needs-ci PRs that need a full CI run. stream Issues and PRs related to the stream subsystem. web streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants