WebGPU RT: device storage-buffer limit hardcoded to 16, breaks pipelines with >1 user storage buffer #8
Labels
No labels
claude:done
claude:in-progress
claude:ready
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Catcrafts/Crafter.Graphics#8
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The new WebGPU wavefront RT pipeline cannot be used with more than ~1
user storage buffer because
additional/dom-webgpu.jshardcodes thedevice's
maxStorageBuffersPerShaderStagerequest to 16, even thoughthe adapter supports far more (64 on the GPU in our test environment). The
wavefront SHADE compute kernel already binds ~15 storage buffers, so
any RT pipeline that declares 2+ user storage buffers via
UICustomBindingoverflows the limit and fails to build.
Details
The SHADE kernel binds (counting storage buffers only):
@group(1):tlasEntries,bvhNodes,meshRecords,vertices,indices,primRemap,vertexAttribs,tlasEntryOrder,tlasBvhNodes,wfRaysA,wfRaysB,wfHits,wfAccum,wfCounters(14) +
wfPayload(1) = 15@group(2):wfIndirect(1) = 16@group(3): user bindings — everyUICustomBindingKind::Buffer/BufferReadWriteadds one more.So the kernel is already at the limit before a single user storage buffer.
additional/dom-webgpu.js(~L131):clamp(name, want)doesrequiredLimits[name] = min(want, adapterCap), sothe device is created with 16 even when
adapter.limits .maxStorageBuffersPerShaderStage === 64(verified in our container).This contradicts
WAVEFRONT-DESIGN.md, which says Phase 7 binding packingwas skipped because "target device reports 64 storage buffers/stage (≥12),
so the merge is unnecessary". The adapter reports 64, but the device
only gets 16 due to this clamp, leaving room for ~1 user storage buffer —
not ≥12.
UICustomBindingKindalso has no uniform-buffer option (onlyBuffer/BufferReadWrite), so consumers can't relieve the pressure by movingsmall per-frame constants (camera, light) to uniform buffers.
Reproduction
Any RT app that registers ≥2 user storage buffers at
@group(3). 3DFortsregisters 4 (camera, light, brace-stress SoA, per-instance TLAS metadata):
The RT pipeline never builds (
rtPipelines=0), so the canvas stays black.Validation
Changing the two clamps from
16to64(or to the adapter's reportedcap) makes the device request 64, the pipeline builds, and the scene
renders correctly with normal per-pass timings (GENERATE/PREP/TRACE/SHADE/
RESOLVE all non-zero). No other change was needed.
Suggested fix
Request the adapter's reported limit instead of a hardcoded 16:
(
clampalready mins against the adapter cap, so this is safe on devicesthat genuinely report only 16 — those would still need the Phase 7 binding
packing, or a fallback that packs the wavefront work buffers.)
Found while porting 3DForts to the wavefront tracer
(3DForts/3DForts#28).