Catcrafts/Crafter.Graphics

Author	SHA1	Message	Date
catbot	1e749818ef	fix(webgpu): reshape wavefront TRACE/SHADE to 2-D to survive >4.19M rays A 1-D indirect dispatch of ceil(W*H/64) workgroups for the wavefront TRACE/SHADE stages overflows maxComputeWorkgroupsPerDimension (65535 on Dawn/Firefox) once the surface exceeds ~4.19M rays (~2560x1640). Per the WebGPU spec such a dispatch is silently dropped — no validation error — so at 4K the world is never traced and the accumulator stays black while non-RT passes survive. _wfPrep now spreads the workgroups across a 2-D grid (x clamped to 65535, y = ceil(wg/65535)), and the wfTrace/wfShade entry points rebuild the linear ray index from (global_invocation_id, num_workgroups). The existing `i >= _wfCurCount()` guard absorbs the grid overshoot. GENERATE/RESOLVE already use a 2-D tile dispatch and are unchanged. Verified in Firefox/WebGPU with RTStress at a 3449x1739 surface (5.99M rays, 93716 workgroups — well over the 65535 cap): renders the full cube grid where master shows a black screen. Resolves #11 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 11:09:15 +00:00
catbot	4e42d663a6	WebGPU RT: wavefront tracer core (GENERATE/PREP/TRACE/SHADE/RESOLVE) Replace the megakernel @compute entry with five wavefront kernels sharing one module, connected by GPU ray/hit/payload buffers and a GPU-driven indirect bounce loop: GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE - TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only). - PREP publishes dispatchWorkgroupsIndirect args from the live ray count; the indirect-args buffer lives in its own bind group so it is never bound read-write in the same dispatch that consumes it as INDIRECT. - New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate, plus an optional user Resolve stage (tonemap hook; identity by default). - Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary between passes within one submit). - Payload-typed wfPayload binding emitted in the codegen region after the user's struct Payload; payload travels with each ray (2WH slots). - Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups PerDimension so the W*H-sized work buffers fit past the 128MB baseline. VulkanTriangle ported to the new API and renders bit-identical to the megakernel baseline at maxDepth=1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 16:24:41 +00:00
Jorijn van der Graaf	b5d0f52da0	webgpu sponza	2026-05-19 00:27:09 +02:00
Jorijn van der Graaf	5553ded476	webgpu triangle	2026-05-18 18:43:30 +02:00

4 commits