WebGPU RT: wavefront tracer core (GENERATE/PREP/TRACE/SHADE/RESOLVE)

Replace the megakernel @compute entry with five wavefront kernels sharing
one module, connected by GPU ray/hit/payload buffers and a GPU-driven
indirect bounce loop:

  GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE

- TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only).
- PREP publishes dispatchWorkgroupsIndirect args from the live ray count;
  the indirect-args buffer lives in its own bind group so it is never
  bound read-write in the same dispatch that consumes it as INDIRECT.
- New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate,
  plus an optional user Resolve stage (tonemap hook; identity by default).
- Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary
  between passes within one submit).
- Payload-typed wfPayload binding emitted in the codegen region after the
  user's struct Payload; payload travels with each ray (2*W*H slots).
- Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups
  PerDimension so the W*H-sized work buffers fit past the 128MB baseline.

VulkanTriangle ported to the new API and renders bit-identical to the
megakernel baseline at maxDepth=1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
catbot 2026-05-31 16:24:41 +00:00
commit 4e42d663a6
9 changed files with 755 additions and 101 deletions

View file

@ -72,6 +72,11 @@ export namespace Crafter {
// 0 means "no user bindings".
const void* handlesPtr = nullptr;
std::uint32_t handlesCount = 0;
// Wavefront bounce budget: number of (TRACE; SHADE) iterations.
// 1 = primary rays only; 2 = primary + one continuation/shadow
// bounce; etc. The library unrolls GENERATE; (PREP; TRACE; SHADE)
// ×maxDepth; RESOLVE.
std::uint32_t maxDepth = 1;
RTPass(PipelineRTWebGPU* p) : pipeline(p) {}
@ -88,7 +93,8 @@ export namespace Crafter {
static_cast<std::int32_t>(gx),
static_cast<std::int32_t>(gy),
handlesPtr,
static_cast<std::int32_t>(handlesCount));
static_cast<std::int32_t>(handlesCount),
static_cast<std::int32_t>(maxDepth));
}
};
}

View file

@ -18,6 +18,11 @@ export namespace Crafter {
Miss = 1,
ClosestHit = 2,
AnyHit = 3,
// Wavefront RESOLVE-stage tonemap/output hook. Optional: if no
// Resolve shader is registered, RESOLVE writes the linear accum
// buffer through unchanged. Signature:
// fn <entryFn>(coord: vec2<u32>, hdr: vec4<f32>) -> vec4<f32>
Resolve = 4,
};
// One WGSL shader source + the function name PipelineRTWebGPU should

View file

@ -201,7 +201,8 @@ namespace Crafter::WebGPU {
std::uint32_t tlasBufHandle,
std::int32_t instanceCount,
std::int32_t gx, std::int32_t gy,
const void* handlesPtr, std::int32_t handlesCount);
const void* handlesPtr, std::int32_t handlesCount,
std::int32_t maxDepth);
// GPU TLAS-build dispatch. Two sequential compute passes:
// 1. tlasBuildMain — per-instance world AABB + identity permutation