fix(webgpu): request adapter's storage-buffer limit, not hardcoded 16
dom-webgpu.js capped maxStorageBuffersPerShaderStage at 16 even when the adapter reports far more (64 in our test env). The wavefront SHADE kernel already binds ~16 storage buffers before any user binding, so any RT pipeline declaring 2+ user storage buffers at @group(3) overflowed the limit and failed to build with "Too many bindings of type StorageBuffers". Request the adapter's reported maxStorageBuffersPerShaderStage / maxStorageBuffersInPipelineLayout instead of a fixed 16. `clamp` already mins against the adapter cap, so baseline-only devices still get a valid request, and the `|| 16` fallback + the `typeof cap === "number"` guard handle limit names a browser doesn't expose (Firefox returns null for maxStorageBuffersInPipelineLayout). Verified in-browser: a 17-storage-buffer compute pipeline fails with the exact reported error on a device clamped to 16, and builds cleanly on a device requesting the adapter's 64. RTStress renders correctly. Resolves #8 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
26a41ac528
commit
23780d83a8
2 changed files with 21 additions and 12 deletions
|
|
@ -66,7 +66,11 @@ maxDepth=1 (primary only). Sponza maxDepth=2 (primary + shadow).
|
||||||
- [x] megakernel dead path removed (RT pipeline builds only wavefront)
|
- [x] megakernel dead path removed (RT pipeline builds only wavefront)
|
||||||
- [~] binding packing (Phase 7): SKIPPED — target device reports 64 storage
|
- [~] binding packing (Phase 7): SKIPPED — target device reports 64 storage
|
||||||
buffers/stage (≥12), so the merge is unnecessary (issue makes it
|
buffers/stage (≥12), so the merge is unnecessary (issue makes it
|
||||||
conditional on <12).
|
conditional on <12). NOTE: this only holds because dom-webgpu.js now
|
||||||
|
requests the adapter's reported maxStorageBuffersPerShaderStage at
|
||||||
|
device creation (was hardcoded to 16, which left room for ~1 user
|
||||||
|
storage buffer and broke RT pipelines with ≥2). Devices that genuinely
|
||||||
|
report <12 storage buffers/stage still need this packing.
|
||||||
|
|
||||||
### Measured (this container's GPU, via timestamp-query; NOT a 4090)
|
### Measured (this container's GPU, via timestamp-query; NOT a 4090)
|
||||||
Per-pass GPU time, 1920×995, primary+shadow (maxDepth=2):
|
Per-pass GPU time, 1920×995, primary+shadow (maxDepth=2):
|
||||||
|
|
|
||||||
|
|
@ -111,15 +111,20 @@ if (!adapter) {
|
||||||
throw initError;
|
throw initError;
|
||||||
}
|
}
|
||||||
// Ask for everything the adapter is willing to give us, up to the values
|
// Ask for everything the adapter is willing to give us, up to the values
|
||||||
// the RT pipeline actually needs. The megakernel prelude declares 7
|
// the RT pipeline actually needs. The wavefront SHADE kernel alone binds
|
||||||
// storage buffers at group(1) (tlasEntries / bvhNodes / meshRecords /
|
// ~16 storage buffers (14 RT/work buffers + wfPayload at group(1),
|
||||||
// vertices / indices / primRemap / vertexAttribs); user pipelines like
|
// wfIndirect at group(2)) BEFORE a single user binding — and user
|
||||||
// 3DForts add more at group(2), and the WebGPU baseline of 8 isn't
|
// pipelines like 3DForts add several more at group(3) (camera, light,
|
||||||
// enough. Adapters routinely report 10+ — clamp our request to whatever
|
// brace-stress SoA, per-instance TLAS metadata). A hardcoded request of
|
||||||
// the adapter actually supports so the call doesn't reject on baseline-
|
// 16 leaves room for ~1 user storage buffer and overflows the moment a
|
||||||
// only devices. Same pattern for storage textures (we use 1 output image
|
// pipeline declares 2+, failing the build with "Too many bindings of
|
||||||
// per dispatch but headroom is cheap) and for the global storage-buffer
|
// type StorageBuffers". So request whatever the adapter actually
|
||||||
// pool which is the per-pipeline count's parent budget.
|
// supports (the GPUs we target report 64) rather than a fixed 16;
|
||||||
|
// `clamp` already mins against the adapter cap, so baseline-only devices
|
||||||
|
// (reporting just 8) still get a valid — if tight — request. Same
|
||||||
|
// headroom-is-cheap pattern for storage textures (1 output image per
|
||||||
|
// dispatch) and for the pipeline-layout pool that parents the per-stage
|
||||||
|
// count.
|
||||||
const adapterLimits = adapter.limits || {};
|
const adapterLimits = adapter.limits || {};
|
||||||
const requiredLimits = {};
|
const requiredLimits = {};
|
||||||
const clamp = (name, want) => {
|
const clamp = (name, want) => {
|
||||||
|
|
@ -128,8 +133,8 @@ const clamp = (name, want) => {
|
||||||
requiredLimits[name] = Math.min(want, cap);
|
requiredLimits[name] = Math.min(want, cap);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
clamp("maxStorageBuffersPerShaderStage", 16);
|
clamp("maxStorageBuffersPerShaderStage", adapterLimits.maxStorageBuffersPerShaderStage || 16);
|
||||||
clamp("maxStorageBuffersInPipelineLayout", 16);
|
clamp("maxStorageBuffersInPipelineLayout", adapterLimits.maxStorageBuffersInPipelineLayout || 16);
|
||||||
clamp("maxStorageTexturesPerShaderStage", 8);
|
clamp("maxStorageTexturesPerShaderStage", 8);
|
||||||
// The TLAS BVH build runs one workgroup of up to N threads in shared
|
// The TLAS BVH build runs one workgroup of up to N threads in shared
|
||||||
// memory (bitonic sort over morton codes + sweep-tree refit). Need the
|
// memory (bitonic sort over morton codes + sweep-tree refit). Need the
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue