docs: wavefront RT in README + design-doc status; add RTStress to examples
This commit is contained in:
parent
afc0292fab
commit
358084185a
2 changed files with 39 additions and 10 deletions
17
README.md
17
README.md
|
|
@ -50,9 +50,16 @@ compute pipeline composed from user-supplied WGSL stages).
|
||||||
bridge. Atlas (`r8unorm`, sub-region writes) is a separate path.
|
bridge. Atlas (`r8unorm`, sub-region writes) is a separate path.
|
||||||
- **PipelineRTVulkan / PipelineRTWebGPU / ShaderBindingTableVulkan /
|
- **PipelineRTVulkan / PipelineRTWebGPU / ShaderBindingTableVulkan /
|
||||||
ShaderBindingTableWebGPU / RTPass** — ray-tracing pipelines. Vulkan
|
ShaderBindingTableWebGPU / RTPass** — ray-tracing pipelines. Vulkan
|
||||||
uses native RT pipelines + SBTs; WebGPU composes one compute
|
uses native RT pipelines + SBTs; WebGPU compiles a **wavefront /
|
||||||
pipeline by stitching the traversal library, a generated hit-group
|
streaming** software tracer — five `@compute` kernels
|
||||||
switch, and the user's raygen / closesthit / miss / anyhit WGSL.
|
(`GENERATE → PREP → TRACE → SHADE → RESOLVE`) sharing one module,
|
||||||
|
connected by GPU ray/hit/payload buffers and a GPU-driven indirect
|
||||||
|
bounce loop (`dispatchWorkgroupsIndirect`). TRACE carries zero user
|
||||||
|
code (traversal + intersection only); user raygen calls
|
||||||
|
`rtEmitPrimaryRay`, and closesthit / miss run in SHADE where they
|
||||||
|
`rtEmitRay` continuation/shadow rays and `rtAccumulate` radiance. An
|
||||||
|
optional Resolve shader tonemaps the linear accumulator. See
|
||||||
|
[WAVEFRONT-DESIGN.md](WAVEFRONT-DESIGN.md).
|
||||||
- **ComputeShader / WebGPUComputeShader** — Tier 1 wrapper used by the
|
- **ComputeShader / WebGPUComputeShader** — Tier 1 wrapper used by the
|
||||||
UI system. Vulkan loads a `.spv` and dispatches with
|
UI system. Vulkan loads a `.spv` and dispatches with
|
||||||
`vkCmdPushDataEXT`; WebGPU loads a user-supplied `.wgsl` blob at
|
`vkCmdPushDataEXT`; WebGPU loads a user-supplied `.wgsl` blob at
|
||||||
|
|
@ -145,6 +152,10 @@ See [examples/](examples/). Quick map:
|
||||||
- [VulkanTriangle](examples/VulkanTriangle/) — ray-traced triangle on
|
- [VulkanTriangle](examples/VulkanTriangle/) — ray-traced triangle on
|
||||||
both Vulkan and WebGPU. The smallest test of the bindless + RT path
|
both Vulkan and WebGPU. The smallest test of the bindless + RT path
|
||||||
on each backend.
|
on each backend.
|
||||||
|
- [RTStress](examples/RTStress/) — wavefront RT benchmark: an N×N×N grid
|
||||||
|
of a cube mesh (instance-count knob `kGrid`, 512 → 8000) shaded with
|
||||||
|
primary + shadow rays. Prints a GPU timestamp-query per-pass breakdown
|
||||||
|
each second. WebGPU/DOM only.
|
||||||
- [Sponza](examples/Sponza/) — ray-traced Sponza atrium on both
|
- [Sponza](examples/Sponza/) — ray-traced Sponza atrium on both
|
||||||
backends. Exercises `.cmesh` / `.ctex` decompression (GPU
|
backends. Exercises `.cmesh` / `.ctex` decompression (GPU
|
||||||
`VK_EXT_memory_decompression` on Vulkan, CPU on WebGPU) and a
|
`VK_EXT_memory_decompression` on Vulkan, CPU on WebGPU) and a
|
||||||
|
|
|
||||||
|
|
@ -53,13 +53,31 @@ Compile/runtime knob. JS unrolls the chain to maxDepth. VulkanTriangle
|
||||||
maxDepth=1 (primary only). Sponza maxDepth=2 (primary + shadow).
|
maxDepth=1 (primary only). Sponza maxDepth=2 (primary + shadow).
|
||||||
|
|
||||||
## Status / progress
|
## Status / progress
|
||||||
- [x] baseline VulkanTriangle renders (megakernel) — /tmp/baseline-triangle.png
|
- [x] baseline VulkanTriangle renders (megakernel)
|
||||||
- [ ] wavefront prelude + codegen
|
- [x] wavefront prelude + codegen (5 entry points share one module)
|
||||||
- [ ] VulkanTriangle on wavefront (maxDepth=1)
|
- [x] VulkanTriangle on wavefront (maxDepth=1) — bit-identical to baseline
|
||||||
- [ ] bounce loop + indirect + Sponza shadow port
|
- [x] indirect-dispatch bounce loop + PREP (cross-pass atomics proven)
|
||||||
- [ ] RTStress example + timestamp queries
|
- [x] RTStress example (N³ cube grid) + GPU timestamp-query per-pass HUD
|
||||||
- [ ] ordered traversal, dynamic TLAS depth, device limits
|
- [x] Sponza port (shadow ray in SHADE) — renders the atrium correctly
|
||||||
- [ ] remove megakernel dual path; final validation; PR
|
- [x] ordered (nearest-child-first) traversal
|
||||||
|
- [x] dynamic TLAS sweep-tree depth (next_pow2 instances)
|
||||||
|
- [x] device limits (maxBufferSize / maxStorageBufferBindingSize /
|
||||||
|
maxComputeWorkgroupsPerDimension) + timestamp-query feature
|
||||||
|
- [x] megakernel dead path removed (RT pipeline builds only wavefront)
|
||||||
|
- [~] binding packing (Phase 7): SKIPPED — target device reports 64 storage
|
||||||
|
buffers/stage (≥12), so the merge is unnecessary (issue makes it
|
||||||
|
conditional on <12).
|
||||||
|
|
||||||
|
### Measured (this container's GPU, via timestamp-query; NOT a 4090)
|
||||||
|
Per-pass GPU time, 1920×995, primary+shadow (maxDepth=2):
|
||||||
|
- RTStress 512 inst: GEN ~0.80ms TRACE ~1.63ms SHADE ~1.00ms total ~3.52ms (~280 fps)
|
||||||
|
- RTStress 4096 inst: GEN ~0.80ms TRACE ~1.95ms SHADE ~1.00ms total ~3.85ms (~260 fps)
|
||||||
|
- Sponza: GEN ~0.79ms TRACE ~1.81ms SHADE ~1.00ms total ~3.69ms
|
||||||
|
8× the instances costs only ~16% more TRACE — the spatial TLAS + ordered
|
||||||
|
descent scale sub-linearly. NOTE: a 4090 number and the TRACE-kernel
|
||||||
|
register/occupancy delta require hardware + a profiler not available in
|
||||||
|
this CI container; the architectural win (TRACE carries zero user code, so
|
||||||
|
its register footprint is the traversal loop alone) is structural.
|
||||||
|
|
||||||
## Files
|
## Files
|
||||||
- `additional/dom-webgpu.js` — prelude (`rtWgsl*`), `wgpuLoadRTPipeline`,
|
- `additional/dom-webgpu.js` — prelude (`rtWgsl*`), `wgpuLoadRTPipeline`,
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue