docs(vulkan-rt): native descriptor-heap AS read is an NVIDIA driver fault (#7) #10

Merged
catbot merged 1 commit from claude/issue-7 into master 2026-06-01 00:22:52 +02:00
3 changed files with 59 additions and 6 deletions
Showing only changes of commit 464cb66063 - Show all commits

docs(vulkan-rt): record native descriptor-heap AS read as a driver fault

Investigated the VK_ERROR_DEVICE_LOST on the native VulkanTriangle (#7).
Verified the engine side is correct and validation-clean: the BLAS/TLAS
build finishes before render (FinishInit waits), the built instance is
well-formed (identity transform, mask=0xFF, correct BLAS ref), and
vkWriteResourceDescriptorsEXT stores the TLAS device address at the
expected heap offset (confirmed by dumping the heap bytes). Khronos
validation 1.4.350 reports zero errors.

The fault is isolated to reading the acceleration structure through
VK_EXT_descriptor_heap:
- images/buffers via the same heap render fine (trace disabled -> the
  raygen imageStore path renders a full gradient);
- both traceRayEXT and inline rayQueryEXT (no SBT) fault identically on
  the AS read;
- reproduces with the AS descriptor at heap byte 0 / shader index 0 (no
  offset/stride ambiguity) and regardless of pAddressRange size.

NVIDIA 610.43.02 is the only descriptor_heap implementation available
(llvmpipe lacks the extension), so there is no second implementation to
cross-check. Conclusion: driver-side fault in NVIDIA's brand-new
VK_EXT_descriptor_heap acceleration-structure path; should be reported to
NVIDIA. The traceRayEXT call is left active so the example stays a
faithful reproducer. Documented in both READMEs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
catbot 2026-05-31 22:21:57 +00:00

View file

@ -24,6 +24,15 @@ Vulkan ray tracing is hardware (`VK_KHR_ray_tracing_pipeline`); WebGPU
ray tracing is a library-built software path (BVH + traceRay in a ray tracing is a library-built software path (BVH + traceRay in a
compute pipeline composed from user-supplied WGSL stages). compute pipeline composed from user-supplied WGSL stages).
> **Native RT status:** reading an acceleration structure through
> `VK_EXT_descriptor_heap` currently aborts with `VK_ERROR_DEVICE_LOST` on
> NVIDIA driver `610.43.02` — a driver-side fault in the brand-new
> descriptor-heap acceleration-structure path, not an engine bug. The
> engine setup (build, descriptors, SBT) is correct and validation-clean,
> and images/buffers through the same heap work. See
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
> for the full investigation. WebGPU RT is unaffected.
## What's in here ## What's in here
- **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas - **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas

View file

@ -28,11 +28,49 @@ cd examples/VulkanTriangle
crafter-build -r crafter-build -r
``` ```
You should see a 1280×720 window with a triangle filling roughly the On a working driver you should see a 1280×720 window with a triangle
centre. filling roughly the centre. **On the current NVIDIA driver the native
build aborts with `VK_ERROR_DEVICE_LOST` the moment `traceRayEXT` runs —
see below.**
## Notes ## Native status — known driver fault (`VK_ERROR_DEVICE_LOST`)
`raygen.glsl`'s `traceRayEXT` call is currently commented out — the On NVIDIA driver `610.43.02` (Vulkan 1.4) the native build aborts with
example exercises the dispatch and `imageStore` paths only. Uncomment `VK_ERROR_DEVICE_LOST` on the first frame as soon as the shader reads the
it to actually trace into the BLAS. acceleration structure. `VK_EXT_device_fault` reports an invalid GPU read
(address `~0xffff…`) plus instruction-pointer faults inside the
ray-tracing shader. Commenting out the `traceRayEXT` call makes the crash
disappear (the dispatch + `imageStore` path renders a solid colour fine).
This was investigated thoroughly and traced to the **acceleration-structure
read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup:
- The BLAS/TLAS build is correct and finishes before rendering
(`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance
has an identity transform, `mask = 0xFF`, and the correct BLAS device
address.
- The AS descriptor is written correctly — `vkWriteResourceDescriptorsEXT`
stores the TLAS device address at the expected heap byte offset (verified
by dumping the raw heap bytes after the write).
- The Khronos validation layers (1.4.350, current) report **zero** errors
for the whole frame, including the SBT regions handed to
`vkCmdTraceRaysKHR`.
- Storage images and buffers bound through the **same** descriptor heap
work — with `traceRayEXT` removed, the raygen shader's `imageStore`
renders correctly, so the heap binding / image path is sound.
- Both the ray-tracing pipeline (`traceRayEXT`) **and** inline ray query
(`rayQueryEXT`, which uses no shader binding table) fault identically
when they read the acceleration structure from the heap. That isolates
the fault to the AS-via-heap read, not the SBT or the RT pipeline.
- The fault reproduces even with the AS descriptor written at heap byte 0
and read at shader index 0 (no descriptor offset/stride ambiguity), and
is unaffected by the `pAddressRange` size.
- `VK_EXT_descriptor_heap` is brand new; on this machine NVIDIA is the only
implementation that advertises it (llvmpipe does not), so there is no
second conformant implementation to cross-check against.
**Conclusion:** this is a driver-side fault in NVIDIA's
`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug. It
should be reported to NVIDIA. The `traceRayEXT` call is intentionally left
in `raygen.glsl` so this stays a faithful one-file reproducer; the example
will start rendering the triangle again once a fixed driver ships.

View file

@ -201,6 +201,12 @@ int main() {
RTPass rtPass(&pipeline); RTPass rtPass(&pipeline);
window.passes.push_back(&rtPass); window.passes.push_back(&rtPass);
// NOTE: on NVIDIA 610.43.02 this aborts with VK_ERROR_DEVICE_LOST the
// first time the raygen shader reads the acceleration structure out of
// the VK_EXT_descriptor_heap. The build, descriptors and SBT are all
// correct and validation-clean; it is a driver-side fault in the
// descriptor-heap acceleration-structure path. See README.md
// ("Native status — known driver fault") for the full investigation.
window.Render(); window.Render();
window.StartSync(); window.StartSync();
} }