WebGPU RT: wavefront tracer core (GENERATE/PREP/TRACE/SHADE/RESOLVE)

Replace the megakernel @compute entry with five wavefront kernels sharing one module, connected by GPU ray/hit/payload buffers and a GPU-driven indirect bounce loop: GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE - TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only). - PREP publishes dispatchWorkgroupsIndirect args from the live ray count; the indirect-args buffer lives in its own bind group so it is never bound read-write in the same dispatch that consumes it as INDIRECT. - New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate, plus an optional user Resolve stage (tonemap hook; identity by default). - Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary between passes within one submit). - Payload-typed wfPayload binding emitted in the codegen region after the user's struct Payload; payload travels with each ray (2*W*H slots). - Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups PerDimension so the W*H-sized work buffers fit past the 128MB baseline. VulkanTriangle ported to the new API and renders bit-identical to the megakernel baseline at maxDepth=1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 16:24:41 +00:00 · 2026-05-31 16:24:41 +00:00 · 4e42d663a6
commit 4e42d663a6
parent e0d72f57f2
9 changed files with 755 additions and 101 deletions
--- a/interfaces/Crafter.Graphics-RTPass.cppm
+++ b/interfaces/Crafter.Graphics-RTPass.cppm
@ -72,6 +72,11 @@ export namespace Crafter {
        // 0 means "no user bindings".
        const void*       handlesPtr   = nullptr;
        std::uint32_t     handlesCount = 0;
+        // Wavefront bounce budget: number of (TRACE; SHADE) iterations.
+        // 1 = primary rays only; 2 = primary + one continuation/shadow
+        // bounce; etc. The library unrolls GENERATE; (PREP; TRACE; SHADE)
+        // ×maxDepth; RESOLVE.
+        std::uint32_t     maxDepth     = 1;

        RTPass(PipelineRTWebGPU* p) : pipeline(p) {}

@ -88,7 +93,8 @@ export namespace Crafter {
                static_cast<std::int32_t>(gx),
                static_cast<std::int32_t>(gy),
                handlesPtr,
-                static_cast<std::int32_t>(handlesCount));
+                static_cast<std::int32_t>(handlesCount),
+                static_cast<std::int32_t>(maxDepth));
        }
    };
 }
--- a/interfaces/Crafter.Graphics-ShaderBindingTableWebGPU.cppm
+++ b/interfaces/Crafter.Graphics-ShaderBindingTableWebGPU.cppm
@ -18,6 +18,11 @@ export namespace Crafter {
        Miss       = 1,
        ClosestHit = 2,
        AnyHit     = 3,
+        // Wavefront RESOLVE-stage tonemap/output hook. Optional: if no
+        // Resolve shader is registered, RESOLVE writes the linear accum
+        // buffer through unchanged. Signature:
+        //   fn <entryFn>(coord: vec2<u32>, hdr: vec4<f32>) -> vec4<f32>
+        Resolve    = 4,
    };

    // One WGSL shader source + the function name PipelineRTWebGPU should
--- a/interfaces/Crafter.Graphics-WebGPU.cppm
+++ b/interfaces/Crafter.Graphics-WebGPU.cppm
@ -201,7 +201,8 @@ namespace Crafter::WebGPU {
                                   std::uint32_t tlasBufHandle,
                                   std::int32_t  instanceCount,
                                   std::int32_t  gx, std::int32_t gy,
-                                   const void* handlesPtr, std::int32_t handlesCount);
+                                   const void* handlesPtr, std::int32_t handlesCount,
+                                   std::int32_t maxDepth);

    // GPU TLAS-build dispatch. Two sequential compute passes:
    //   1. tlasBuildMain — per-instance world AABB + identity permutation