From 358084185a7fd563f21c0f9940039413b80e24c8 Mon Sep 17 00:00:00 2001 From: catbot Date: Sun, 31 May 2026 20:29:12 +0000 Subject: [PATCH] docs: wavefront RT in README + design-doc status; add RTStress to examples --- README.md | 17 ++++++++++++++--- WAVEFRONT-DESIGN.md | 32 +++++++++++++++++++++++++------- 2 files changed, 39 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 6aca07b..b64e32d 100644 --- a/README.md +++ b/README.md @@ -50,9 +50,16 @@ compute pipeline composed from user-supplied WGSL stages). bridge. Atlas (`r8unorm`, sub-region writes) is a separate path. - **PipelineRTVulkan / PipelineRTWebGPU / ShaderBindingTableVulkan / ShaderBindingTableWebGPU / RTPass** — ray-tracing pipelines. Vulkan - uses native RT pipelines + SBTs; WebGPU composes one compute - pipeline by stitching the traversal library, a generated hit-group - switch, and the user's raygen / closesthit / miss / anyhit WGSL. + uses native RT pipelines + SBTs; WebGPU compiles a **wavefront / + streaming** software tracer — five `@compute` kernels + (`GENERATE → PREP → TRACE → SHADE → RESOLVE`) sharing one module, + connected by GPU ray/hit/payload buffers and a GPU-driven indirect + bounce loop (`dispatchWorkgroupsIndirect`). TRACE carries zero user + code (traversal + intersection only); user raygen calls + `rtEmitPrimaryRay`, and closesthit / miss run in SHADE where they + `rtEmitRay` continuation/shadow rays and `rtAccumulate` radiance. An + optional Resolve shader tonemaps the linear accumulator. See + [WAVEFRONT-DESIGN.md](WAVEFRONT-DESIGN.md). - **ComputeShader / WebGPUComputeShader** — Tier 1 wrapper used by the UI system. Vulkan loads a `.spv` and dispatches with `vkCmdPushDataEXT`; WebGPU loads a user-supplied `.wgsl` blob at @@ -145,6 +152,10 @@ See [examples/](examples/). Quick map: - [VulkanTriangle](examples/VulkanTriangle/) — ray-traced triangle on both Vulkan and WebGPU. The smallest test of the bindless + RT path on each backend. +- [RTStress](examples/RTStress/) — wavefront RT benchmark: an N×N×N grid + of a cube mesh (instance-count knob `kGrid`, 512 → 8000) shaded with + primary + shadow rays. Prints a GPU timestamp-query per-pass breakdown + each second. WebGPU/DOM only. - [Sponza](examples/Sponza/) — ray-traced Sponza atrium on both backends. Exercises `.cmesh` / `.ctex` decompression (GPU `VK_EXT_memory_decompression` on Vulkan, CPU on WebGPU) and a diff --git a/WAVEFRONT-DESIGN.md b/WAVEFRONT-DESIGN.md index 78d77e2..47e42d0 100644 --- a/WAVEFRONT-DESIGN.md +++ b/WAVEFRONT-DESIGN.md @@ -53,13 +53,31 @@ Compile/runtime knob. JS unrolls the chain to maxDepth. VulkanTriangle maxDepth=1 (primary only). Sponza maxDepth=2 (primary + shadow). ## Status / progress -- [x] baseline VulkanTriangle renders (megakernel) — /tmp/baseline-triangle.png -- [ ] wavefront prelude + codegen -- [ ] VulkanTriangle on wavefront (maxDepth=1) -- [ ] bounce loop + indirect + Sponza shadow port -- [ ] RTStress example + timestamp queries -- [ ] ordered traversal, dynamic TLAS depth, device limits -- [ ] remove megakernel dual path; final validation; PR +- [x] baseline VulkanTriangle renders (megakernel) +- [x] wavefront prelude + codegen (5 entry points share one module) +- [x] VulkanTriangle on wavefront (maxDepth=1) — bit-identical to baseline +- [x] indirect-dispatch bounce loop + PREP (cross-pass atomics proven) +- [x] RTStress example (N³ cube grid) + GPU timestamp-query per-pass HUD +- [x] Sponza port (shadow ray in SHADE) — renders the atrium correctly +- [x] ordered (nearest-child-first) traversal +- [x] dynamic TLAS sweep-tree depth (next_pow2 instances) +- [x] device limits (maxBufferSize / maxStorageBufferBindingSize / + maxComputeWorkgroupsPerDimension) + timestamp-query feature +- [x] megakernel dead path removed (RT pipeline builds only wavefront) +- [~] binding packing (Phase 7): SKIPPED — target device reports 64 storage + buffers/stage (≥12), so the merge is unnecessary (issue makes it + conditional on <12). + +### Measured (this container's GPU, via timestamp-query; NOT a 4090) +Per-pass GPU time, 1920×995, primary+shadow (maxDepth=2): +- RTStress 512 inst: GEN ~0.80ms TRACE ~1.63ms SHADE ~1.00ms total ~3.52ms (~280 fps) +- RTStress 4096 inst: GEN ~0.80ms TRACE ~1.95ms SHADE ~1.00ms total ~3.85ms (~260 fps) +- Sponza: GEN ~0.79ms TRACE ~1.81ms SHADE ~1.00ms total ~3.69ms +8× the instances costs only ~16% more TRACE — the spatial TLAS + ordered +descent scale sub-linearly. NOTE: a 4090 number and the TRACE-kernel +register/occupancy delta require hardware + a profiler not available in +this CI container; the architectural win (TRACE carries zero user code, so +its register footprint is the traversal loop alone) is structural. ## Files - `additional/dom-webgpu.js` — prelude (`rtWgsl*`), `wgpuLoadRTPipeline`,