WebGPU RT: device storage-buffer limit hardcoded to 16, breaks pipelines with >1 user storage buffer #8

Closed
opened 2026-05-31 23:46:23 +02:00 by catbot · 0 comments
Member

Summary

The new WebGPU wavefront RT pipeline cannot be used with more than ~1
user storage buffer because additional/dom-webgpu.js hardcodes the
device's maxStorageBuffersPerShaderStage request to 16, even though
the adapter supports far more (64 on the GPU in our test environment). The
wavefront SHADE compute kernel already binds ~15 storage buffers, so
any RT pipeline that declares 2+ user storage buffers via UICustomBinding
overflows the limit and fails to build.

Details

The SHADE kernel binds (counting storage buffers only):

  • @group(1): tlasEntries, bvhNodes, meshRecords, vertices,
    indices, primRemap, vertexAttribs, tlasEntryOrder,
    tlasBvhNodes, wfRaysA, wfRaysB, wfHits, wfAccum, wfCounters
    (14) + wfPayload (1) = 15
  • @group(2): wfIndirect (1) = 16
  • @group(3): user bindings — every UICustomBindingKind::Buffer /
    BufferReadWrite adds one more.

So the kernel is already at the limit before a single user storage buffer.

additional/dom-webgpu.js (~L131):

clamp("maxStorageBuffersPerShaderStage", 16);
clamp("maxStorageBuffersInPipelineLayout", 16);

clamp(name, want) does requiredLimits[name] = min(want, adapterCap), so
the device is created with 16 even when adapter.limits .maxStorageBuffersPerShaderStage === 64 (verified in our container).

This contradicts WAVEFRONT-DESIGN.md, which says Phase 7 binding packing
was skipped because "target device reports 64 storage buffers/stage (≥12),
so the merge is unnecessary". The adapter reports 64, but the device
only gets 16 due to this clamp, leaving room for ~1 user storage buffer —
not ≥12.

UICustomBindingKind also has no uniform-buffer option (only Buffer /
BufferReadWrite), so consumers can't relieve the pressure by moving
small per-frame constants (camera, light) to uniform buffers.

Reproduction

Any RT app that registers ≥2 user storage buffers at @group(3). 3DForts
registers 4 (camera, light, brace-stress SoA, per-instance TLAS metadata):

[crafter-wgpu] uncaptured error: Too many bindings of type StorageBuffers
  in Stage ShaderStages(COMPUTE), limit is 16, count was 19. Check the
  limit `max_storage_buffers_per_shader_stage` passed to
  `Adapter::request_device`
[crafter-wgpu] uncaptured error: PipelineLayout with '' label is invalid
[crafter-wgpu] uncaptured error: In a set_pipeline command, caused by:
  ComputePipeline with '' label is invalid

The RT pipeline never builds (rtPipelines=0), so the canvas stays black.

Validation

Changing the two clamps from 16 to 64 (or to the adapter's reported
cap) makes the device request 64, the pipeline builds, and the scene
renders correctly with normal per-pass timings (GENERATE/PREP/TRACE/SHADE/
RESOLVE all non-zero). No other change was needed.

Suggested fix

Request the adapter's reported limit instead of a hardcoded 16:

clamp("maxStorageBuffersPerShaderStage",
      adapterLimits.maxStorageBuffersPerShaderStage || 16);
clamp("maxStorageBuffersInPipelineLayout",
      adapterLimits.maxStorageBuffersInPipelineLayout || 16);

(clamp already mins against the adapter cap, so this is safe on devices
that genuinely report only 16 — those would still need the Phase 7 binding
packing, or a fallback that packs the wavefront work buffers.)

Found while porting 3DForts to the wavefront tracer
(3DForts/3DForts#28).

## Summary The new WebGPU wavefront RT pipeline cannot be used with more than ~1 user storage buffer because `additional/dom-webgpu.js` hardcodes the device's `maxStorageBuffersPerShaderStage` request to **16**, even though the adapter supports far more (64 on the GPU in our test environment). The wavefront **SHADE** compute kernel already binds ~15 storage buffers, so any RT pipeline that declares 2+ user storage buffers via `UICustomBinding` overflows the limit and fails to build. ## Details The SHADE kernel binds (counting storage buffers only): - `@group(1)`: `tlasEntries`, `bvhNodes`, `meshRecords`, `vertices`, `indices`, `primRemap`, `vertexAttribs`, `tlasEntryOrder`, `tlasBvhNodes`, `wfRaysA`, `wfRaysB`, `wfHits`, `wfAccum`, `wfCounters` (14) + `wfPayload` (1) = 15 - `@group(2)`: `wfIndirect` (1) = 16 - `@group(3)`: user bindings — every `UICustomBindingKind::Buffer` / `BufferReadWrite` adds one more. So the kernel is already at the limit before a single user storage buffer. `additional/dom-webgpu.js` (~L131): ```js clamp("maxStorageBuffersPerShaderStage", 16); clamp("maxStorageBuffersInPipelineLayout", 16); ``` `clamp(name, want)` does `requiredLimits[name] = min(want, adapterCap)`, so the device is created with **16** even when `adapter.limits .maxStorageBuffersPerShaderStage === 64` (verified in our container). This contradicts `WAVEFRONT-DESIGN.md`, which says Phase 7 binding packing was skipped because "target device reports 64 storage buffers/stage (≥12), so the merge is unnecessary". The *adapter* reports 64, but the *device* only gets 16 due to this clamp, leaving room for ~1 user storage buffer — not ≥12. `UICustomBindingKind` also has no uniform-buffer option (only `Buffer` / `BufferReadWrite`), so consumers can't relieve the pressure by moving small per-frame constants (camera, light) to uniform buffers. ## Reproduction Any RT app that registers ≥2 user storage buffers at `@group(3)`. 3DForts registers 4 (camera, light, brace-stress SoA, per-instance TLAS metadata): ``` [crafter-wgpu] uncaptured error: Too many bindings of type StorageBuffers in Stage ShaderStages(COMPUTE), limit is 16, count was 19. Check the limit `max_storage_buffers_per_shader_stage` passed to `Adapter::request_device` [crafter-wgpu] uncaptured error: PipelineLayout with '' label is invalid [crafter-wgpu] uncaptured error: In a set_pipeline command, caused by: ComputePipeline with '' label is invalid ``` The RT pipeline never builds (`rtPipelines=0`), so the canvas stays black. ## Validation Changing the two clamps from `16` to `64` (or to the adapter's reported cap) makes the device request 64, the pipeline builds, and the scene renders correctly with normal per-pass timings (GENERATE/PREP/TRACE/SHADE/ RESOLVE all non-zero). No other change was needed. ## Suggested fix Request the adapter's reported limit instead of a hardcoded 16: ```js clamp("maxStorageBuffersPerShaderStage", adapterLimits.maxStorageBuffersPerShaderStage || 16); clamp("maxStorageBuffersInPipelineLayout", adapterLimits.maxStorageBuffersInPipelineLayout || 16); ``` (`clamp` already mins against the adapter cap, so this is safe on devices that genuinely report only 16 — those would still need the Phase 7 binding packing, or a fallback that packs the wavefront work buffers.) Found while porting 3DForts to the wavefront tracer (3DForts/3DForts#28).
catbot 2026-05-31 23:58:19 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Catcrafts/Crafter.Graphics#8
No description provided.