fix(vulkan-rt): work around NVIDIA descriptor-heap AS-read device-loss (#15)

Reading an acceleration structure through VK_EXT_descriptor_heap aborts
with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 — a brand-new-extension
driver fault isolated in #7 (engine setup is correct and validation-clean;
images/buffers through the same heap work, and both traceRayEXT and inline
rayQuery fault identically on the AS read).

An acceleration structure can equally be reached by its device address via
OpConvertUToAccelerationStructureKHR, which reads no descriptor and so never
touches the faulting heap path. glslang has no GLSL spelling for that
conversion, so VulkanShader rewrites the compiled SPIR-V at module-load
time: every `OpLoad %accelStruct <heap-ptr>` becomes a load of the TLAS
device address from a synthesized push-constant block followed by the
convert. RTPass pushes the active frame's TLAS address into that push
constant. User GLSL and example code are unchanged; acceleration structures
still bind into the heap normally.

The workaround is gated on Device::workaroundDescriptorHeapAS (true only on
the NVIDIA proprietary driver) and confined to one fenced block in
Crafter.Graphics-ShaderVulkan.cppm plus the RTPass push and the shaderInt64
feature toggle — delete those once a fixed NVIDIA driver ships and the heap
AS read becomes the direct path again.

Verified: VulkanTriangle ray-traces correctly on native NVIDIA (RTX 4090),
validation-layer-clean, no device loss. The SPIR-V rewrite was independently
validated with spirv-val on both the VulkanTriangle and Sponza raygen
modules.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
catbot 2026-06-03 01:59:54 +00:00
commit 950059c86e
7 changed files with 270 additions and 30 deletions

View file

@ -29,11 +29,17 @@ geometry, closest-hit / miss / any-hit / intersection shaders — see
shaded through an intersection shader with an any-hit cut-out.
> **Native RT status:** reading an acceleration structure through
> `VK_EXT_descriptor_heap` currently aborts with `VK_ERROR_DEVICE_LOST` on
> NVIDIA driver `610.43.02` — a driver-side fault in the brand-new
> descriptor-heap acceleration-structure path, not an engine bug. The
> engine setup (build, descriptors, SBT) is correct and validation-clean,
> and images/buffers through the same heap work. See
> `VK_EXT_descriptor_heap` aborts with `VK_ERROR_DEVICE_LOST` on NVIDIA
> driver `610.43.02` — a driver-side fault in the brand-new descriptor-heap
> acceleration-structure path, not an engine bug (the setup is correct and
> validation-clean; images/buffers through the same heap work). The engine
> **works around it transparently** (issue #15): on the NVIDIA driver only,
> `VulkanShader` rewrites the compiled SPIR-V so heap AS reads become a
> TLAS-device-address + `OpConvertUToAccelerationStructureKHR` path (which
> reads no descriptor), and `RTPass` supplies the address as push data.
> Shaders and example code are unchanged, and it's a single fenced block
> gated on `Device::workaroundDescriptorHeapAS`, removable once a fixed
> driver ships. See
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
> for the full investigation. WebGPU RT is unaffected.