docs(vulkan-rt): native descriptor-heap AS read is an NVIDIA driver fault (#7) #10

Merged
catbot merged 1 commit from claude/issue-7 into master 2026-06-01 00:22:52 +02:00
3 changed files with 59 additions and 6 deletions

View file

@ -24,6 +24,15 @@ Vulkan ray tracing is hardware (`VK_KHR_ray_tracing_pipeline`); WebGPU
ray tracing is a library-built software path (BVH + traceRay in a
compute pipeline composed from user-supplied WGSL stages).
> **Native RT status:** reading an acceleration structure through
> `VK_EXT_descriptor_heap` currently aborts with `VK_ERROR_DEVICE_LOST` on
> NVIDIA driver `610.43.02` — a driver-side fault in the brand-new
> descriptor-heap acceleration-structure path, not an engine bug. The
> engine setup (build, descriptors, SBT) is correct and validation-clean,
> and images/buffers through the same heap work. See
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
> for the full investigation. WebGPU RT is unaffected.
## What's in here
- **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas

View file

@ -28,11 +28,49 @@ cd examples/VulkanTriangle
crafter-build -r
```
You should see a 1280×720 window with a triangle filling roughly the
centre.
On a working driver you should see a 1280×720 window with a triangle
filling roughly the centre. **On the current NVIDIA driver the native
build aborts with `VK_ERROR_DEVICE_LOST` the moment `traceRayEXT` runs —
see below.**
## Notes
## Native status — known driver fault (`VK_ERROR_DEVICE_LOST`)
`raygen.glsl`'s `traceRayEXT` call is currently commented out — the
example exercises the dispatch and `imageStore` paths only. Uncomment
it to actually trace into the BLAS.
On NVIDIA driver `610.43.02` (Vulkan 1.4) the native build aborts with
`VK_ERROR_DEVICE_LOST` on the first frame as soon as the shader reads the
acceleration structure. `VK_EXT_device_fault` reports an invalid GPU read
(address `~0xffff…`) plus instruction-pointer faults inside the
ray-tracing shader. Commenting out the `traceRayEXT` call makes the crash
disappear (the dispatch + `imageStore` path renders a solid colour fine).
This was investigated thoroughly and traced to the **acceleration-structure
read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup:
- The BLAS/TLAS build is correct and finishes before rendering
(`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance
has an identity transform, `mask = 0xFF`, and the correct BLAS device
address.
- The AS descriptor is written correctly — `vkWriteResourceDescriptorsEXT`
stores the TLAS device address at the expected heap byte offset (verified
by dumping the raw heap bytes after the write).
- The Khronos validation layers (1.4.350, current) report **zero** errors
for the whole frame, including the SBT regions handed to
`vkCmdTraceRaysKHR`.
- Storage images and buffers bound through the **same** descriptor heap
work — with `traceRayEXT` removed, the raygen shader's `imageStore`
renders correctly, so the heap binding / image path is sound.
- Both the ray-tracing pipeline (`traceRayEXT`) **and** inline ray query
(`rayQueryEXT`, which uses no shader binding table) fault identically
when they read the acceleration structure from the heap. That isolates
the fault to the AS-via-heap read, not the SBT or the RT pipeline.
- The fault reproduces even with the AS descriptor written at heap byte 0
and read at shader index 0 (no descriptor offset/stride ambiguity), and
is unaffected by the `pAddressRange` size.
- `VK_EXT_descriptor_heap` is brand new; on this machine NVIDIA is the only
implementation that advertises it (llvmpipe does not), so there is no
second conformant implementation to cross-check against.
**Conclusion:** this is a driver-side fault in NVIDIA's
`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug. It
should be reported to NVIDIA. The `traceRayEXT` call is intentionally left
in `raygen.glsl` so this stays a faithful one-file reproducer; the example
will start rendering the triangle again once a fixed driver ships.

View file

@ -201,6 +201,12 @@ int main() {
RTPass rtPass(&pipeline);
window.passes.push_back(&rtPass);
// NOTE: on NVIDIA 610.43.02 this aborts with VK_ERROR_DEVICE_LOST the
// first time the raygen shader reads the acceleration structure out of
// the VK_EXT_descriptor_heap. The build, descriptors and SBT are all
// correct and validation-clean; it is a driver-side fault in the
// descriptor-heap acceleration-structure path. See README.md
// ("Native status — known driver fault") for the full investigation.
window.Render();
window.StartSync();
}