fix(vulkan-rt): work around NVIDIA descriptor-heap AS-read device-loss (#15) #16

Merged
catbot merged 1 commit from claude/issue-15 into master 2026-06-03 04:00:38 +02:00
Member

Resolves #15.

Problem

Reading an acceleration structure through VK_EXT_descriptor_heap aborts with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 — the driver fault isolated in #7. The engine side is correct and validation-clean; images/buffers through the same heap work, and both traceRayEXT and inline rayQueryEXT fault identically on the AS read.

Workaround

An acceleration structure can equally be reached by its device address via OpConvertUToAccelerationStructureKHR, which reads no descriptor and so never touches the faulting heap path. glslang has no GLSL spelling for that conversion, so the engine rewrites the compiled SPIR-V at module-load time:

  • VulkanShader (one fenced block in Crafter.Graphics-ShaderVulkan.cppm): every OpLoad of an accelerationStructureEXT out of the heap becomes a load of the TLAS device address from a synthesized push-constant block, followed by OpConvertUToAccelerationStructureKHR. Shaders with no acceleration structure are left untouched.
  • RTPass pushes the active frame's TLAS device address into that push constant.
  • Gated on Device::workaroundDescriptorHeapAS — true only on the NVIDIA proprietary driver. shaderInt64 is enabled on the same condition.

User GLSL (raygen.glsl) and example code are unchanged; acceleration structures still bind into the heap normally. On every other driver the workaround is inert. It's removable wholesale (one block + the RTPass push + the flag + the feature toggle) once a fixed NVIDIA driver ships.

Verification

  • VulkanTriangle ray-traces correctly on native NVIDIA (RTX 4090), validation-layer-clean, no device loss (the validation layers run spirv-val on the rewritten module at vkCreateShaderModule and report no errors).
  • The SPIR-V rewrite was independently validated with spirv-val on both the VulkanTriangle (SPIR-V 1.4) and Sponza raygen modules.
  • Non-RT paths (HelloUI compute/UI) and the engine link are unregressed.

There are no automated tests in the repo (crafter-build test reports none); verification was by exercising the app directly.

Screenshots

VulkanTriangle ray-traced on native NVIDIA via the SPIR-V workaround

🤖 Generated with Claude Code

Resolves #15. ## Problem Reading an acceleration structure through `VK_EXT_descriptor_heap` aborts with `VK_ERROR_DEVICE_LOST` on NVIDIA `610.43.02` — the driver fault isolated in #7. The engine side is correct and validation-clean; images/buffers through the same heap work, and both `traceRayEXT` and inline `rayQueryEXT` fault identically on the AS read. ## Workaround An acceleration structure can equally be reached by its **device address** via `OpConvertUToAccelerationStructureKHR`, which reads no descriptor and so never touches the faulting heap path. glslang has no GLSL spelling for that conversion, so the engine rewrites the compiled SPIR-V at module-load time: - **`VulkanShader`** (one fenced block in `Crafter.Graphics-ShaderVulkan.cppm`): every `OpLoad` of an `accelerationStructureEXT` out of the heap becomes a load of the TLAS device address from a synthesized push-constant block, followed by `OpConvertUToAccelerationStructureKHR`. Shaders with no acceleration structure are left untouched. - **`RTPass`** pushes the active frame's TLAS device address into that push constant. - Gated on **`Device::workaroundDescriptorHeapAS`** — true only on the NVIDIA proprietary driver. `shaderInt64` is enabled on the same condition. User GLSL (`raygen.glsl`) and example code are **unchanged**; acceleration structures still bind into the heap normally. On every other driver the workaround is inert. It's removable wholesale (one block + the RTPass push + the flag + the feature toggle) once a fixed NVIDIA driver ships. ## Verification - `VulkanTriangle` ray-traces correctly on native NVIDIA (RTX 4090), **validation-layer-clean, no device loss** (the validation layers run spirv-val on the rewritten module at `vkCreateShaderModule` and report no errors). - The SPIR-V rewrite was independently validated with `spirv-val` on both the VulkanTriangle (SPIR-V 1.4) and Sponza raygen modules. - Non-RT paths (HelloUI compute/UI) and the engine link are unregressed. There are no automated tests in the repo (`crafter-build test` reports none); verification was by exercising the app directly. ## Screenshots ![VulkanTriangle ray-traced on native NVIDIA via the SPIR-V workaround](https://forgejo.catcrafts.net/attachments/8de39576-5d99-47e9-82d2-42717c4fb4cc) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Reading an acceleration structure through VK_EXT_descriptor_heap aborts
with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 — a brand-new-extension
driver fault isolated in #7 (engine setup is correct and validation-clean;
images/buffers through the same heap work, and both traceRayEXT and inline
rayQuery fault identically on the AS read).

An acceleration structure can equally be reached by its device address via
OpConvertUToAccelerationStructureKHR, which reads no descriptor and so never
touches the faulting heap path. glslang has no GLSL spelling for that
conversion, so VulkanShader rewrites the compiled SPIR-V at module-load
time: every `OpLoad %accelStruct <heap-ptr>` becomes a load of the TLAS
device address from a synthesized push-constant block followed by the
convert. RTPass pushes the active frame's TLAS address into that push
constant. User GLSL and example code are unchanged; acceleration structures
still bind into the heap normally.

The workaround is gated on Device::workaroundDescriptorHeapAS (true only on
the NVIDIA proprietary driver) and confined to one fenced block in
Crafter.Graphics-ShaderVulkan.cppm plus the RTPass push and the shaderInt64
feature toggle — delete those once a fixed NVIDIA driver ships and the heap
AS read becomes the direct path again.

Verified: VulkanTriangle ray-traces correctly on native NVIDIA (RTX 4090),
validation-layer-clean, no device loss. The SPIR-V rewrite was independently
validated with spirv-val on both the VulkanTriangle and Sponza raygen
modules.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
catbot merged commit f24107264d into master 2026-06-03 04:00:38 +02:00
catbot deleted branch claude/issue-15 2026-06-03 04:00:38 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Catcrafts/Crafter.Graphics!16
No description provided.