WebGPU RT: wavefront/streaming tracer (replaces megakernel) #4
No reviewers
Labels
No labels
claude:done
claude:in-progress
claude:ready
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Catcrafts/Crafter.Graphics!4
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "claude/issue-3"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Replaces the WebGPU megakernel software ray tracer with a wavefront /
streaming tracer, end-to-end, in one PR (the remaining phases from #1; #2
landed only the TLAS bitonic sort). Refs #1.
Pipeline
The single
@computemegakernel is gone. The RT pipeline now compiles fivekernels sharing one module, connected by GPU ray/hit/payload buffers and a
GPU-driven indirect bounce loop:
rtEmitPrimaryRay.dispatchWorkgroupsIndirectargs from thelive ray count; zeroes the next emit counter. The indirect-args buffer
lives in its own bind group so it is never bound read-write in the same
dispatch that consumes it as
INDIRECT(a Dawn usage-scope rule provenout with a standalone emit→prep→indirect round-trip first).
_rtwTraverseTlas/
_rtwTraverseBlasonly. This is the occupancy win: TRACE's registerfootprint is the traversal loop alone, with no SBT/shading code inlined.
rtAccumulateradiance andrtEmitRaycontinuation/shadow rays into thenext buffer.
Breaking API change
raygen emits a primary ray instead of calling
traceRay; closesthit/missrun in SHADE and emit/accumulate. New API:
rtEmitPrimaryRay,rtEmitRay,rtAccumulate, and an optionalWebGPURTStage::Resolvetonemap hook. ThePayload-typedwfPayloadstorage binding is emitted in the codegen regionafter the user's
struct Payload; payload travels with each ray(
2·W·Hslots, double-buffered). User bindings move to@group(3)(0..2 reserved for WfParams / data heaps / indirect args).
Also in this PR
examples/RTStress/): the standing many-instance benchmark —an N³ cube grid (
kGridknob, 512 → 8000), primary + shadow rays.(
_rtAabbTentry-t; push the farther child only when it hits, re-cull onpop).
log2(next_pow2(instances))instead of a fixed 14 (the bitonic sort isuntouched — sentinels already sink to the tail).
maxBufferSize,maxStorageBufferBindingSize(payload store ≈245 MB at 1080p, over the 128 MB baseline),
maxComputeWorkgroupsPerDimension, and thetimestamp-queryfeature.buffers/stage (≥12), so the merge is unnecessary (issue gates it on <12).
Validation
All three render correctly through the wavefront pipeline (validated in
Firefox/Dawn WebGPU):
maxDepth=1(also exercises the single-instancenPadded=1degeneratetree).
shadows, Reinhard+gamma resolve.
Per-pass GPU time (timestamp-query, 1920×995, primary+shadow)
8× the instances costs only ~16% more TRACE — the spatial TLAS + ordered
descent scale sub-linearly.
See WAVEFRONT-DESIGN.md for the full design.
Screenshots
Sponza — atrium through the wavefront pipeline (textures, two-sided shading, sun + ambient, shadows, Reinhard+gamma resolve):
RTStress — 512 instances (left, default
kGrid=8) and 4096 (right,kGrid=16), primary + shadow:VulkanTriangle — bit-identical to the pre-rewrite megakernel:
Resolves #3