diff --git a/README.md b/README.md index b64e32d..8778c52 100644 --- a/README.md +++ b/README.md @@ -24,6 +24,15 @@ Vulkan ray tracing is hardware (`VK_KHR_ray_tracing_pipeline`); WebGPU ray tracing is a library-built software path (BVH + traceRay in a compute pipeline composed from user-supplied WGSL stages). +> **Native RT status:** reading an acceleration structure through +> `VK_EXT_descriptor_heap` currently aborts with `VK_ERROR_DEVICE_LOST` on +> NVIDIA driver `610.43.02` — a driver-side fault in the brand-new +> descriptor-heap acceleration-structure path, not an engine bug. The +> engine setup (build, descriptors, SBT) is correct and validation-clean, +> and images/buffers through the same heap work. See +> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md) +> for the full investigation. WebGPU RT is unaffected. + ## What's in here - **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas diff --git a/examples/VulkanTriangle/README.md b/examples/VulkanTriangle/README.md index a287c5b..2c0ee64 100644 --- a/examples/VulkanTriangle/README.md +++ b/examples/VulkanTriangle/README.md @@ -28,11 +28,49 @@ cd examples/VulkanTriangle crafter-build -r ``` -You should see a 1280×720 window with a triangle filling roughly the -centre. +On a working driver you should see a 1280×720 window with a triangle +filling roughly the centre. **On the current NVIDIA driver the native +build aborts with `VK_ERROR_DEVICE_LOST` the moment `traceRayEXT` runs — +see below.** -## Notes +## Native status — known driver fault (`VK_ERROR_DEVICE_LOST`) -`raygen.glsl`'s `traceRayEXT` call is currently commented out — the -example exercises the dispatch and `imageStore` paths only. Uncomment -it to actually trace into the BLAS. +On NVIDIA driver `610.43.02` (Vulkan 1.4) the native build aborts with +`VK_ERROR_DEVICE_LOST` on the first frame as soon as the shader reads the +acceleration structure. `VK_EXT_device_fault` reports an invalid GPU read +(address `~0xffff…`) plus instruction-pointer faults inside the +ray-tracing shader. Commenting out the `traceRayEXT` call makes the crash +disappear (the dispatch + `imageStore` path renders a solid colour fine). + +This was investigated thoroughly and traced to the **acceleration-structure +read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup: + +- The BLAS/TLAS build is correct and finishes before rendering + (`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance + has an identity transform, `mask = 0xFF`, and the correct BLAS device + address. +- The AS descriptor is written correctly — `vkWriteResourceDescriptorsEXT` + stores the TLAS device address at the expected heap byte offset (verified + by dumping the raw heap bytes after the write). +- The Khronos validation layers (1.4.350, current) report **zero** errors + for the whole frame, including the SBT regions handed to + `vkCmdTraceRaysKHR`. +- Storage images and buffers bound through the **same** descriptor heap + work — with `traceRayEXT` removed, the raygen shader's `imageStore` + renders correctly, so the heap binding / image path is sound. +- Both the ray-tracing pipeline (`traceRayEXT`) **and** inline ray query + (`rayQueryEXT`, which uses no shader binding table) fault identically + when they read the acceleration structure from the heap. That isolates + the fault to the AS-via-heap read, not the SBT or the RT pipeline. +- The fault reproduces even with the AS descriptor written at heap byte 0 + and read at shader index 0 (no descriptor offset/stride ambiguity), and + is unaffected by the `pAddressRange` size. +- `VK_EXT_descriptor_heap` is brand new; on this machine NVIDIA is the only + implementation that advertises it (llvmpipe does not), so there is no + second conformant implementation to cross-check against. + +**Conclusion:** this is a driver-side fault in NVIDIA's +`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug. It +should be reported to NVIDIA. The `traceRayEXT` call is intentionally left +in `raygen.glsl` so this stays a faithful one-file reproducer; the example +will start rendering the triangle again once a fixed driver ships. diff --git a/examples/VulkanTriangle/main.cpp b/examples/VulkanTriangle/main.cpp index 3fcd287..29c3e8f 100644 --- a/examples/VulkanTriangle/main.cpp +++ b/examples/VulkanTriangle/main.cpp @@ -201,6 +201,12 @@ int main() { RTPass rtPass(&pipeline); window.passes.push_back(&rtPass); + // NOTE: on NVIDIA 610.43.02 this aborts with VK_ERROR_DEVICE_LOST the + // first time the raygen shader reads the acceleration structure out of + // the VK_EXT_descriptor_heap. The build, descriptors and SBT are all + // correct and validation-clean; it is a driver-side fault in the + // descriptor-heap acceleration-structure path. See README.md + // ("Native status — known driver fault") for the full investigation. window.Render(); window.StartSync(); }