fix(webgpu): request adapter's storage-buffer limit, not hardcoded 16

dom-webgpu.js capped maxStorageBuffersPerShaderStage at 16 even when the adapter reports far more (64 in our test env). The wavefront SHADE kernel already binds ~16 storage buffers before any user binding, so any RT pipeline declaring 2+ user storage buffers at @group(3) overflowed the limit and failed to build with "Too many bindings of type StorageBuffers". Request the adapter's reported maxStorageBuffersPerShaderStage / maxStorageBuffersInPipelineLayout instead of a fixed 16. `clamp` already mins against the adapter cap, so baseline-only devices still get a valid request, and the `|| 16` fallback + the `typeof cap === "number"` guard handle limit names a browser doesn't expose (Firefox returns null for maxStorageBuffersInPipelineLayout). Verified in-browser: a 17-storage-buffer compute pipeline fails with the exact reported error on a device clamped to 16, and builds cleanly on a device requesting the adapter's 64. RTStress renders correctly. Resolves #8 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 21:55:42 +00:00 · 2026-05-31 21:55:42 +00:00 · 23780d83a8
commit 23780d83a8
parent 26a41ac528
2 changed files with 21 additions and 12 deletions
--- a/WAVEFRONT-DESIGN.md
+++ b/WAVEFRONT-DESIGN.md
@ -66,7 +66,11 @@ maxDepth=1 (primary only). Sponza maxDepth=2 (primary + shadow).
 - [x] megakernel dead path removed (RT pipeline builds only wavefront)
 - [~] binding packing (Phase 7): SKIPPED — target device reports 64 storage
      buffers/stage (≥12), so the merge is unnecessary (issue makes it
-      conditional on <12).
+      conditional on <12). NOTE: this only holds because dom-webgpu.js now
      requests the adapter's reported maxStorageBuffersPerShaderStage at
      device creation (was hardcoded to 16, which left room for ~1 user
      storage buffer and broke RT pipelines with ≥2). Devices that genuinely
      report <12 storage buffers/stage still need this packing.
 ### Measured (this container's GPU, via timestamp-query; NOT a 4090)
 Per-pass GPU time, 1920×995, primary+shadow (maxDepth=2):
--- a/additional/dom-webgpu.js
+++ b/additional/dom-webgpu.js
@ -111,15 +111,20 @@ if (!adapter) {
    throw initError;
 }
 // Ask for everything the adapter is willing to give us, up to the values
-// the RT pipeline actually needs. The megakernel prelude declares 7
+// the RT pipeline actually needs. The wavefront SHADE kernel alone binds
-// storage buffers at group(1) (tlasEntries / bvhNodes / meshRecords /
+// ~16 storage buffers (14 RT/work buffers + wfPayload at group(1),
-// vertices / indices / primRemap / vertexAttribs); user pipelines like
+// wfIndirect at group(2)) BEFORE a single user binding — and user
-// 3DForts add more at group(2), and the WebGPU baseline of 8 isn't
+// pipelines like 3DForts add several more at group(3) (camera, light,
-// enough. Adapters routinely report 10+ — clamp our request to whatever
+// brace-stress SoA, per-instance TLAS metadata). A hardcoded request of
-// the adapter actually supports so the call doesn't reject on baseline-
+// 16 leaves room for ~1 user storage buffer and overflows the moment a
-// only devices. Same pattern for storage textures (we use 1 output image
+// pipeline declares 2+, failing the build with "Too many bindings of
-// per dispatch but headroom is cheap) and for the global storage-buffer
+// type StorageBuffers". So request whatever the adapter actually
-// pool which is the per-pipeline count's parent budget.
+// supports (the GPUs we target report 64) rather than a fixed 16;
 // `clamp` already mins against the adapter cap, so baseline-only devices
 // (reporting just 8) still get a valid — if tight — request. Same
 // headroom-is-cheap pattern for storage textures (1 output image per
 // dispatch) and for the pipeline-layout pool that parents the per-stage
 // count.
 const adapterLimits = adapter.limits || {};
 const requiredLimits = {};
 const clamp = (name, want) => {
@ -128,8 +133,8 @@ const clamp = (name, want) => {
        requiredLimits[name] = Math.min(want, cap);
    }
 };
-clamp("maxStorageBuffersPerShaderStage", 16);
+clamp("maxStorageBuffersPerShaderStage", adapterLimits.maxStorageBuffersPerShaderStage || 16);
-clamp("maxStorageBuffersInPipelineLayout", 16);
+clamp("maxStorageBuffersInPipelineLayout", adapterLimits.maxStorageBuffersInPipelineLayout || 16);
 clamp("maxStorageTexturesPerShaderStage", 8);
 // The TLAS BVH build runs one workgroup of up to N threads in shared
 // memory (bitonic sort over morton codes + sweep-tree refit). Need the