fix(webgpu-rt): dynamic rayQuery TLAS leaf-start so picks hit for realistic instance counts (#25) #26

Merged
catbot merged 2 commits from claude/issue-25 into master 2026-06-04 15:33:55 +02:00
Member

Summary

Fixes the WebGPU software ray-query shim's TLAS traversal, which used a
compile-time-constant leaf-start (TLAS_BVH_LEAVES_START = 16384 - 1)
while the actual TLAS sweep tree is built at depth
log2(next_pow2(instanceCount)). For any scene with fewer than 8193
instances the padded leaf count is far below 16384, so no node index
ever reached the hardcoded leaf start
— every node looked internal, the
descent walked into zeroed out-of-tree AABBs, and _rqTraverseTlas
reported a permanent miss. This broke every rayQuery=true compute shader
on the WebGPU backend (builder picking, splash queries, …).

Fix

The shim now derives its leaf-start the same way the megakernel
_rtwTraverseTlas does — from a per-build dynamic value:

  • New RqTlasMeta.nPadded uniform at @group(1) @binding(10), written
    each wgpuBuildTLAS from wfNextPow2(instanceCount).
  • Bound by both rayQuery dispatch paths (wgpuDispatchCustom,
    wgpuDispatchCompute) and declared in both rayQuery BGLs.
  • _rqTraverseTlas computes leavesStart = rqTlasMeta.nPadded - 1u
    instead of the constant.

Test

Adds examples/RayQueryPick: an 8³ = 512-instance TLAS (squarely in the
broken < 8193 regime) that shoots one analytically-determined ray
through a rayQuery=true PlainComputeShader and checks the read-back
committed hit. Verified in Firefox against a real WebGPU adapter:

[RayQueryPick] result: hit=1 customIndex=484 prim=6 t=40.75
[RayQueryPick] PASS — rayQuery TLAS traversal hit the expected instance

hit=1, customIndex=484, and t=40.75 all match the analytic answer
(ray from x=50 down the row, hitting the +X face of the ix=7 cube at
x=9.25). Before the fix this read back a permanent miss. crafter-build test is green.

Screenshots

The example also renders the 512-cube grid through the wavefront RT
pipeline:

result

Resolves #25

## Summary Fixes the WebGPU software ray-query shim's TLAS traversal, which used a compile-time-constant leaf-start (`TLAS_BVH_LEAVES_START = 16384 - 1`) while the actual TLAS sweep tree is built at depth `log2(next_pow2(instanceCount))`. For any scene with fewer than 8193 instances the padded leaf count is far below 16384, so **no node index ever reached the hardcoded leaf start** — every node looked internal, the descent walked into zeroed out-of-tree AABBs, and `_rqTraverseTlas` reported a permanent miss. This broke every `rayQuery=true` compute shader on the WebGPU backend (builder picking, splash queries, …). ## Fix The shim now derives its leaf-start the same way the megakernel `_rtwTraverseTlas` does — from a per-build dynamic value: - New `RqTlasMeta.nPadded` uniform at `@group(1) @binding(10)`, written each `wgpuBuildTLAS` from `wfNextPow2(instanceCount)`. - Bound by both rayQuery dispatch paths (`wgpuDispatchCustom`, `wgpuDispatchCompute`) and declared in both rayQuery BGLs. - `_rqTraverseTlas` computes `leavesStart = rqTlasMeta.nPadded - 1u` instead of the constant. ## Test Adds `examples/RayQueryPick`: an 8³ = 512-instance TLAS (squarely in the broken `< 8193` regime) that shoots one analytically-determined ray through a `rayQuery=true` `PlainComputeShader` and checks the read-back committed hit. Verified in Firefox against a real WebGPU adapter: ``` [RayQueryPick] result: hit=1 customIndex=484 prim=6 t=40.75 [RayQueryPick] PASS — rayQuery TLAS traversal hit the expected instance ``` `hit=1`, `customIndex=484`, and `t=40.75` all match the analytic answer (ray from x=50 down the row, hitting the +X face of the ix=7 cube at x=9.25). Before the fix this read back a permanent miss. `crafter-build test` is green. ## Screenshots The example also renders the 512-cube grid through the wavefront RT pipeline: ![result](https://forgejo.catcrafts.net/attachments/894b99b5-bb44-4400-8dfb-919f35ce937b) Resolves #25
107 KiB
The software rayQuery shim's _rqTraverseTlas detected BVH leaves with a
compile-time constant TLAS_BVH_LEAVES_START = 16384 - 1, while the actual
TLAS sweep tree is built at depth log2(next_pow2(instanceCount)). For any
scene with fewer than 8193 instances the padded leaf count is far below
16384, so no node index ever reached 16383: every node looked internal,
the descent walked into zeroed out-of-tree AABBs, and the pick reported a
permanent miss. This broke every rayQuery=true compute shader (builder
picking, splash queries) on the WebGPU backend.

Pass the per-build padded leaf count to the shim the same way the
megakernel _rtwTraverseTlas reads wfParams.tlasNPadded: a small uniform
(RqTlasMeta.nPadded) at @group(1) @binding(10), written each wgpuBuildTLAS
from wfNextPow2(instanceCount), and bound by both rayQuery dispatch paths.
_rqTraverseTlas now computes leavesStart = nPadded - 1 dynamically.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds an 8^3 = 512-instance TLAS pick test that shoots one analytically
determined ray through a rayQuery=true PlainComputeShader and checks the
read-back committed hit (customIndex 484, t 40.75). 512 instances sit in
the < 8193 regime that the hardcoded 16384-leaf start used to miss, so the
example fails fast if the shim regresses. Verified in Firefox/WebGPU:
"[RayQueryPick] PASS".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
catbot merged commit 2b266262ee into master 2026-06-04 15:33:55 +02:00
catbot deleted branch claude/issue-25 2026-06-04 15:33:55 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Catcrafts/Crafter.Graphics!26
No description provided.