WebGPU wavefront: 1-D indirect TRACE/SHADE dispatch overflows maxComputeWorkgroupsPerDimension at ~4K → black screen #11

Closed
opened 2026-06-01 12:55:23 +02:00 by catbot · 0 comments
Member

Summary

The WebGPU wavefront ray tracer renders a black screen (only non-RT passes such as a UI overlay survive) once the render surface reaches roughly 4K (≈ 8.3 M pixels). 1080p renders correctly. Root cause: the TRACE/SHADE stages are dispatched as a 1‑D indirect dispatch of ceil(W·H/64) workgroups, which overflows maxComputeWorkgroupsPerDimension (65535 on Dawn/Firefox). Per the WebGPU spec, an indirect dispatch whose workgroup count exceeds the per‑dimension limit is silently skipped — no validation error — so the world is never traced and the accumulator stays black.

Found while triaging 3DForts (Catcrafts/3DForts) issue #35 "Browser rendering goes black at high resolutions". 3DForts consumes Crafter.Graphics' wavefront WebGPU path and has no control over the dispatch shape, so the fix belongs here.

Version: master @ afb9e320e128abc2ce1e10a9a46db34a3a493f1f.

Reproduction

  1. Run any wavefront-RT WebGPU app (e.g. the Sponza example, or 3DForts browser sandbox) in a browser tab sized to a 4K display (canvas width·height ≈ 3840×2160).
  2. The RT output is black; only non-RT passes are visible.
  3. Shrink to 1080p and it renders correctly.

ceil(3840·2160 / 64) = 129600 workgroups; ceil(1920·1080 / 64) = 32400. The threshold is 65535·64 = 4 194 240 rays ≈ a 2560×1640 surface.

Empirical confirmation

Standalone WebGPU probe on the same Firefox + GPU, dispatching a trivial @workgroup_size(64) atomic-increment kernel via dispatchWorkgroupsIndirect:

indirect [x,1,1] workgroups threads that executed
4K (129600) 129600 0 (silently dropped, no error)
at cap (65535) 65535 4 194 240 (ran fully)
1080p (32400) 32400 ran fully

adapter.limits.maxComputeWorkgroupsPerDimension === 65535. Note the device cap is 65535 — it cannot be raised by requesting more in requiredLimits, so simply bumping the requested limit does not help; the dispatch must be reshaped.

Root cause (file references)

additional/dom-webgpu.js_wfPrep() publishes a 1‑D indirect dispatch:

fn _wfPrep() {
    let n = _wfCurCount();
    wfIndirect[0] = (n + 63u) / 64u;   // <-- can exceed 65535
    wfIndirect[1] = 1u;
    wfIndirect[2] = 1u;
    ...
}

implementations/Crafter.Graphics-PipelineRTWebGPU.cpp — the TRACE/SHADE entry points read the ray index straight from gid.x, so they assume a 1‑D grid:

wgsl += "fn wfTrace(@builtin(global_invocation_id) gid: vec3<u32>) { _wfTrace(gid.x); }\n";
...
wgsl += "fn wfShade(@builtin(global_invocation_id) gid: vec3<u32>) { _wfShade(gid.x); }\n";

The clamp("maxComputeWorkgroupsPerDimension", 65535) in dom-webgpu.js (its comment says it requests the adapter max "for headroom", but the device cap is 65535) cannot save this — 2‑D reshaping is required.

GENERATE/RESOLVE are unaffected: they use a 2‑D (tileX, tileY) screen-tile dispatch.

Proposed fix

Spread the TRACE/SHADE workgroups across a 2‑D grid in _wfPrep, and reconstruct the linear ray index from (global_invocation_id, num_workgroups) in the entry points.

_wfPrep() in dom-webgpu.js:

fn _wfPrep() {
    let n  = _wfCurCount();
    let wg = (n + 63u) / 64u;
    // maxComputeWorkgroupsPerDimension is 65535 on Dawn/Firefox; a 1-D
    // dispatch of ceil(W·H/64) overflows it past ~4.19M rays (≈2560×1640)
    // and WebGPU silently drops the indirect dispatch. Spread across a 2-D
    // grid; wfTrace/wfShade rebuild the linear index from num_workgroups.
    let MAXDIM = 65535u;
    let gx = min(wg, MAXDIM);
    let gy = (wg + MAXDIM - 1u) / MAXDIM;   // = 1 when wg <= MAXDIM
    wfIndirect[0] = gx;
    wfIndirect[1] = gy;
    wfIndirect[2] = 1u;
    if (wfParams.curIsA == 1u) { atomicStore(&wfCounters[1], 0u); }
    else                       { atomicStore(&wfCounters[0], 0u); }
}

Entry points in Crafter.Graphics-PipelineRTWebGPU.cpp (the 64u is the @workgroup_size(64) x-extent):

wgsl += "fn wfTrace(@builtin(global_invocation_id) gid: vec3<u32>, @builtin(num_workgroups) nwg: vec3<u32>) { _wfTrace(gid.y * nwg.x * 64u + gid.x); }\n";
...
wgsl += "fn wfShade(@builtin(global_invocation_id) gid: vec3<u32>, @builtin(num_workgroups) nwg: vec3<u32>) { _wfShade(gid.y * nwg.x * 64u + gid.x); }\n";

gid.x ∈ [0, gx·64), gid.y ∈ [0, gy), so i = gid.y·(nwg.x·64) + gid.x is a contiguous bijection onto [0, gx·gy·64); the existing if (i >= _wfCurCount()) { return; } guard in _wfTrace/_wfShade absorbs the overshoot. No other stage needs changes.

The per-pixel work buffers in ensureWavefrontBuffers scale with W·H. At exactly 4K (dpr 1) the largest, wf.payload = 2·cap·64, is ≈ 1012 MiB — just under the 1 GiB (1<<30) requested for maxBufferSize/maxStorageBufferBindingSize, so it still allocates and the failure is purely the workgroup-dimension drop. But at 4K with devicePixelRatio > 1, or 5K+, payload/hits cross 1 GiB and creation will fail with a real validation error. Consider raising those requested limits toward the adapter max and/or capping render scale.

## Summary The WebGPU wavefront ray tracer renders a **black screen** (only non-RT passes such as a UI overlay survive) once the render surface reaches roughly **4K** (≈ 8.3 M pixels). 1080p renders correctly. Root cause: the `TRACE`/`SHADE` stages are dispatched as a **1‑D indirect** dispatch of `ceil(W·H/64)` workgroups, which overflows `maxComputeWorkgroupsPerDimension` (65535 on Dawn/Firefox). Per the WebGPU spec, an indirect dispatch whose workgroup count exceeds the per‑dimension limit is **silently skipped** — no validation error — so the world is never traced and the accumulator stays black. Found while triaging 3DForts (`Catcrafts`/`3DForts`) issue #35 "Browser rendering goes black at high resolutions". 3DForts consumes Crafter.Graphics' wavefront WebGPU path and has no control over the dispatch shape, so the fix belongs here. Version: `master` @ `afb9e320e128abc2ce1e10a9a46db34a3a493f1f`. ## Reproduction 1. Run any wavefront-RT WebGPU app (e.g. the Sponza example, or 3DForts browser sandbox) in a browser tab sized to a 4K display (canvas `width·height ≈ 3840×2160`). 2. The RT output is black; only non-RT passes are visible. 3. Shrink to 1080p and it renders correctly. `ceil(3840·2160 / 64) = 129600` workgroups; `ceil(1920·1080 / 64) = 32400`. The threshold is `65535·64 = 4 194 240` rays ≈ a 2560×1640 surface. ## Empirical confirmation Standalone WebGPU probe on the same Firefox + GPU, dispatching a trivial `@workgroup_size(64)` atomic-increment kernel via `dispatchWorkgroupsIndirect`: | indirect `[x,1,1]` | workgroups | threads that executed | |---|---|---| | 4K (`129600`) | 129600 | **0** (silently dropped, no error) | | at cap (`65535`) | 65535 | 4 194 240 (ran fully) | | 1080p (`32400`) | 32400 | ran fully | `adapter.limits.maxComputeWorkgroupsPerDimension === 65535`. Note the device cap **is** 65535 — it cannot be raised by requesting more in `requiredLimits`, so simply bumping the requested limit does not help; the dispatch must be reshaped. ## Root cause (file references) **`additional/dom-webgpu.js`** — `_wfPrep()` publishes a 1‑D indirect dispatch: ```wgsl fn _wfPrep() { let n = _wfCurCount(); wfIndirect[0] = (n + 63u) / 64u; // <-- can exceed 65535 wfIndirect[1] = 1u; wfIndirect[2] = 1u; ... } ``` **`implementations/Crafter.Graphics-PipelineRTWebGPU.cpp`** — the `TRACE`/`SHADE` entry points read the ray index straight from `gid.x`, so they assume a 1‑D grid: ```cpp wgsl += "fn wfTrace(@builtin(global_invocation_id) gid: vec3<u32>) { _wfTrace(gid.x); }\n"; ... wgsl += "fn wfShade(@builtin(global_invocation_id) gid: vec3<u32>) { _wfShade(gid.x); }\n"; ``` The `clamp("maxComputeWorkgroupsPerDimension", 65535)` in `dom-webgpu.js` (its comment says it requests the adapter max "for headroom", but the device cap is 65535) cannot save this — 2‑D reshaping is required. `GENERATE`/`RESOLVE` are unaffected: they use a 2‑D `(tileX, tileY)` screen-tile dispatch. ## Proposed fix Spread the `TRACE`/`SHADE` workgroups across a 2‑D grid in `_wfPrep`, and reconstruct the linear ray index from `(global_invocation_id, num_workgroups)` in the entry points. `_wfPrep()` in `dom-webgpu.js`: ```wgsl fn _wfPrep() { let n = _wfCurCount(); let wg = (n + 63u) / 64u; // maxComputeWorkgroupsPerDimension is 65535 on Dawn/Firefox; a 1-D // dispatch of ceil(W·H/64) overflows it past ~4.19M rays (≈2560×1640) // and WebGPU silently drops the indirect dispatch. Spread across a 2-D // grid; wfTrace/wfShade rebuild the linear index from num_workgroups. let MAXDIM = 65535u; let gx = min(wg, MAXDIM); let gy = (wg + MAXDIM - 1u) / MAXDIM; // = 1 when wg <= MAXDIM wfIndirect[0] = gx; wfIndirect[1] = gy; wfIndirect[2] = 1u; if (wfParams.curIsA == 1u) { atomicStore(&wfCounters[1], 0u); } else { atomicStore(&wfCounters[0], 0u); } } ``` Entry points in `Crafter.Graphics-PipelineRTWebGPU.cpp` (the `64u` is the `@workgroup_size(64)` x-extent): ```cpp wgsl += "fn wfTrace(@builtin(global_invocation_id) gid: vec3<u32>, @builtin(num_workgroups) nwg: vec3<u32>) { _wfTrace(gid.y * nwg.x * 64u + gid.x); }\n"; ... wgsl += "fn wfShade(@builtin(global_invocation_id) gid: vec3<u32>, @builtin(num_workgroups) nwg: vec3<u32>) { _wfShade(gid.y * nwg.x * 64u + gid.x); }\n"; ``` `gid.x ∈ [0, gx·64)`, `gid.y ∈ [0, gy)`, so `i = gid.y·(nwg.x·64) + gid.x` is a contiguous bijection onto `[0, gx·gy·64)`; the existing `if (i >= _wfCurCount()) { return; }` guard in `_wfTrace`/`_wfShade` absorbs the overshoot. No other stage needs changes. ## Secondary, related scaling limit (not the cause of the 4K black screen, but worth flagging) The per-pixel work buffers in `ensureWavefrontBuffers` scale with `W·H`. At exactly 4K (dpr 1) the largest, `wf.payload = 2·cap·64`, is ≈ 1012 MiB — just under the 1 GiB (`1<<30`) requested for `maxBufferSize`/`maxStorageBufferBindingSize`, so it still allocates and the failure is purely the workgroup-dimension drop. But at 4K with `devicePixelRatio > 1`, or 5K+, `payload`/`hits` cross 1 GiB and creation will fail with a real validation error. Consider raising those requested limits toward the adapter max and/or capping render scale.
catbot 2026-06-01 13:10:06 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Catcrafts/Crafter.Graphics#11
No description provided.