Merge pull request 'fix(vulkan-rt): work around NVIDIA descriptor-heap AS-read device-loss (#15)' (#16) from claude/issue-15 into master

This commit is contained in:
catbot 2026-06-03 04:00:38 +02:00
commit f24107264d
7 changed files with 270 additions and 30 deletions

View file

@ -29,11 +29,17 @@ geometry, closest-hit / miss / any-hit / intersection shaders — see
shaded through an intersection shader with an any-hit cut-out. shaded through an intersection shader with an any-hit cut-out.
> **Native RT status:** reading an acceleration structure through > **Native RT status:** reading an acceleration structure through
> `VK_EXT_descriptor_heap` currently aborts with `VK_ERROR_DEVICE_LOST` on > `VK_EXT_descriptor_heap` aborts with `VK_ERROR_DEVICE_LOST` on NVIDIA
> NVIDIA driver `610.43.02` — a driver-side fault in the brand-new > driver `610.43.02` — a driver-side fault in the brand-new descriptor-heap
> descriptor-heap acceleration-structure path, not an engine bug. The > acceleration-structure path, not an engine bug (the setup is correct and
> engine setup (build, descriptors, SBT) is correct and validation-clean, > validation-clean; images/buffers through the same heap work). The engine
> and images/buffers through the same heap work. See > **works around it transparently** (issue #15): on the NVIDIA driver only,
> `VulkanShader` rewrites the compiled SPIR-V so heap AS reads become a
> TLAS-device-address + `OpConvertUToAccelerationStructureKHR` path (which
> reads no descriptor), and `RTPass` supplies the address as push data.
> Shaders and example code are unchanged, and it's a single fenced block
> gated on `Device::workaroundDescriptorHeapAS`, removable once a fixed
> driver ships. See
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md) > [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
> for the full investigation. WebGPU RT is unaffected. > for the full investigation. WebGPU RT is unaffected.

View file

@ -28,22 +28,36 @@ cd examples/VulkanTriangle
crafter-build -r crafter-build -r
``` ```
On a working driver you should see a 1280×720 window with a triangle You should see a 1280×720 window with an RGB-barycentric triangle filling
filling roughly the centre. **On the current NVIDIA driver the native roughly the centre. On the NVIDIA driver this works through an engine-side
build aborts with `VK_ERROR_DEVICE_LOST` the moment `traceRayEXT` runs — workaround for a driver fault — see below.
see below.**
## Native status — known driver fault (`VK_ERROR_DEVICE_LOST`) ## Native status — NVIDIA driver fault, worked around
On NVIDIA driver `610.43.02` (Vulkan 1.4) the native build aborts with On NVIDIA driver `610.43.02` (Vulkan 1.4) reading the acceleration
`VK_ERROR_DEVICE_LOST` on the first frame as soon as the shader reads the structure through `VK_EXT_descriptor_heap` aborts the device with
acceleration structure. `VK_EXT_device_fault` reports an invalid GPU read `VK_ERROR_DEVICE_LOST` on the first frame. This is a **driver-side fault**
(address `~0xffff…`) plus instruction-pointer faults inside the in the brand-new descriptor-heap acceleration-structure path, not an engine
ray-tracing shader. Commenting out the `traceRayEXT` call makes the crash bug (full investigation in #7, summarised below).
disappear (the dispatch + `imageStore` path renders a solid colour fine).
This was investigated thoroughly and traced to the **acceleration-structure **The engine works around it transparently** (issue #15). On the NVIDIA
read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup: proprietary driver only, `VulkanShader` rewrites the compiled SPIR-V at
module-load time so that every `OpLoad` of an `accelerationStructureEXT`
out of the heap becomes a load of the TLAS *device address* (from a
synthesized push-constant block) followed by
`OpConvertUToAccelerationStructureKHR` — which reads no descriptor and so
never touches the faulting path. `RTPass` feeds the active frame's TLAS
address in as push data. `raygen.glsl` and the example code are unchanged;
acceleration structures still bind into the heap normally. On every other
driver the workaround is inert. It's gated on
`Device::workaroundDescriptorHeapAS` and confined to one fenced block in
`interfaces/Crafter.Graphics-ShaderVulkan.cppm` so it can be deleted wholesale
once a fixed NVIDIA driver ships.
### The underlying fault (#7)
The fault was traced to the **acceleration-structure read through
`VK_EXT_descriptor_heap`**, *not* to the engine's RT setup:
- The BLAS/TLAS build is correct and finishes before rendering - The BLAS/TLAS build is correct and finishes before rendering
(`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance (`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance
@ -70,7 +84,7 @@ read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup:
second conformant implementation to cross-check against. second conformant implementation to cross-check against.
**Conclusion:** this is a driver-side fault in NVIDIA's **Conclusion:** this is a driver-side fault in NVIDIA's
`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug. It `VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug, and
should be reported to NVIDIA. The `traceRayEXT` call is intentionally left it should be reported to NVIDIA. Until a fixed driver ships, the SPIR-V
in `raygen.glsl` so this stays a faithful one-file reproducer; the example rewrite above keeps the native RT path working; once it does, remove the
will start rendering the triangle again once a fixed driver ships. workaround and the heap AS read becomes the direct path again.

View file

@ -201,12 +201,13 @@ int main() {
RTPass rtPass(&pipeline); RTPass rtPass(&pipeline);
window.passes.push_back(&rtPass); window.passes.push_back(&rtPass);
// NOTE: on NVIDIA 610.43.02 this aborts with VK_ERROR_DEVICE_LOST the // NOTE: reading the acceleration structure through VK_EXT_descriptor_heap
// first time the raygen shader reads the acceleration structure out of // aborts with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 (a driver fault —
// the VK_EXT_descriptor_heap. The build, descriptors and SBT are all // see #7). The engine transparently works around it: on the NVIDIA driver
// correct and validation-clean; it is a driver-side fault in the // VulkanShader rewrites the heap AS read into a TLAS-device-address +
// descriptor-heap acceleration-structure path. See README.md // OpConvertUToAccelerationStructureKHR path and RTPass feeds the address in
// ("Native status — known driver fault") for the full investigation. // as push data. Nothing here (or in raygen.glsl) changes. See README.md
// ("Native status") and interfaces/Crafter.Graphics-ShaderVulkan.cppm.
window.Render(); window.Render();
window.StartSync(); window.StartSync();
} }

View file

@ -566,12 +566,22 @@ void Device::Initialize() {
memoryDecompressionProperties.pNext = const_cast<void*>(rayTracingProperties.pNext); memoryDecompressionProperties.pNext = const_cast<void*>(rayTracingProperties.pNext);
rayTracingProperties.pNext = &memoryDecompressionProperties; rayTracingProperties.pNext = &memoryDecompressionProperties;
} }
// Chain driver properties onto the tail of the query so we can detect
// the NVIDIA proprietary driver for the descriptor-heap AS-read
// workaround (issue #15 / #7).
descriptorHeapProperties.pNext = &driverProperties;
VkPhysicalDeviceProperties2 properties2 { VkPhysicalDeviceProperties2 properties2 {
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2, .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2,
.pNext = &rayTracingProperties .pNext = &rayTracingProperties
}; };
vkGetPhysicalDeviceProperties2(physDevice, &properties2); vkGetPhysicalDeviceProperties2(physDevice, &properties2);
// NVIDIA's brand-new VK_EXT_descriptor_heap acceleration-structure read
// path faults (see #7); enable the SPIR-V rewrite workaround there. Other
// drivers (and any future fixed NVIDIA driver, once this gate is removed)
// take the normal heap-bound AS path unchanged.
workaroundDescriptorHeapAS = (driverProperties.driverID == VK_DRIVER_ID_NVIDIA_PROPRIETARY);
// Sanity-gate: GDeflate 1.0 must actually be in the supported method set. // Sanity-gate: GDeflate 1.0 must actually be in the supported method set.
if (memoryDecompressionSupported && if (memoryDecompressionSupported &&
(memoryDecompressionProperties.decompressionMethods & VK_MEMORY_DECOMPRESSION_METHOD_GDEFLATE_1_0_BIT_EXT) == 0) { (memoryDecompressionProperties.decompressionMethods & VK_MEMORY_DECOMPRESSION_METHOD_GDEFLATE_1_0_BIT_EXT) == 0) {
@ -699,6 +709,11 @@ void Device::Initialize() {
.shaderSampledImageArrayDynamicIndexing = VK_TRUE, .shaderSampledImageArrayDynamicIndexing = VK_TRUE,
.shaderStorageBufferArrayDynamicIndexing = VK_TRUE, .shaderStorageBufferArrayDynamicIndexing = VK_TRUE,
.shaderStorageImageArrayDynamicIndexing = VK_TRUE, .shaderStorageImageArrayDynamicIndexing = VK_TRUE,
// shaderInt64: only needed for the NVIDIA descriptor-heap AS-read
// workaround (issue #15 / #7), which loads the TLAS device address
// as a 64-bit push constant. Gated so it isn't required on drivers
// that don't take the workaround path. Remove with the workaround.
.shaderInt64 = workaroundDescriptorHeapAS ? VK_TRUE : VK_FALSE,
.shaderInt16 = VK_TRUE .shaderInt16 = VK_TRUE
} }
}; };

View file

@ -165,6 +165,19 @@ export namespace Crafter {
inline static VkPhysicalDeviceMemoryDecompressionPropertiesEXT memoryDecompressionProperties = { inline static VkPhysicalDeviceMemoryDecompressionPropertiesEXT memoryDecompressionProperties = {
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MEMORY_DECOMPRESSION_PROPERTIES_EXT .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MEMORY_DECOMPRESSION_PROPERTIES_EXT
}; };
inline static VkPhysicalDeviceDriverProperties driverProperties = {
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DRIVER_PROPERTIES
};
// ─── NVIDIA descriptor-heap AS-read workaround (issue #15 / #7) ──
// True only on the NVIDIA proprietary driver, where reading an
// acceleration structure through VK_EXT_descriptor_heap aborts with
// VK_ERROR_DEVICE_LOST (a brand-new-extension driver fault, verified
// engine-clean in #7). When set, VulkanShader rewrites heap AS reads
// into a TLAS-device-address + OpConvertUToAccelerationStructureKHR
// path and RTPass pushes the active TLAS address as push data. Delete
// this flag and everything keyed on it once a fixed driver ships.
inline static bool workaroundDescriptorHeapAS = false;
static void CheckVkResult(VkResult result); static void CheckVkResult(VkResult result);
static std::uint32_t GetMemoryType(std::uint32_t typeBits, VkMemoryPropertyFlags properties); static std::uint32_t GetMemoryType(std::uint32_t typeBits, VkMemoryPropertyFlags properties);

View file

@ -27,6 +27,7 @@ import :RenderPass;
import :Window; import :Window;
import :Device; import :Device;
import :PipelineRTVulkan; import :PipelineRTVulkan;
import :RenderingElement3D;
export namespace Crafter { export namespace Crafter {
struct RTPass : RenderPass { struct RTPass : RenderPass {
@ -34,8 +35,22 @@ export namespace Crafter {
RTPass(PipelineRTVulkan* p) : pipeline(p) {} RTPass(PipelineRTVulkan* p) : pipeline(p) {}
void Record(VkCommandBuffer cmd, std::uint32_t /*frameIdx*/, Window& window) override { void Record(VkCommandBuffer cmd, std::uint32_t frameIdx, Window& window) override {
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, pipeline->pipeline); vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, pipeline->pipeline);
// NVIDIA descriptor-heap AS-read workaround (issue #15 / #7): feed
// the active frame's TLAS device address into the push-constant
// block that VulkanShader synthesizes, so the rewritten raygen can
// reach the acceleration structure by address instead of through
// the faulting heap descriptor. Inert on every other driver.
if (Device::workaroundDescriptorHeapAS) {
VkDeviceAddress tlasAddr = RenderingElement3D::tlases[frameIdx].address;
VkPushDataInfoEXT pushInfo {
.sType = VK_STRUCTURE_TYPE_PUSH_DATA_INFO_EXT,
.offset = 0,
.data = { .address = &tlasAddr, .size = sizeof(tlasAddr) },
};
Device::vkCmdPushDataEXT(cmd, &pushInfo);
}
Device::vkCmdTraceRaysKHR(cmd, Device::vkCmdTraceRaysKHR(cmd,
&pipeline->raygenRegion, &pipeline->raygenRegion,
&pipeline->missRegion, &pipeline->missRegion,

View file

@ -27,6 +27,174 @@ import std;
import :Device; import :Device;
import :Types; import :Types;
// ─── BEGIN NVIDIA descriptor-heap AS-read workaround (issue #15 / #7) ─────
// Remove this whole block (and its call below, Device::workaroundDescriptorHeapAS,
// and the RTPass push-data) once NVIDIA ships a driver that fixes the
// VK_EXT_descriptor_heap acceleration-structure read fault.
//
// On the affected driver, reading an `accelerationStructureEXT` out of the
// descriptor heap aborts the device. The build, the heap descriptor write and
// everything else are correct (proven in #7); only the in-shader heap AS read
// is broken — buffers/images through the same heap work. Acceleration
// structures can equally be addressed by their device address, and
// OpConvertUToAccelerationStructureKHR (which reads no descriptor) sidesteps
// the faulting path entirely.
//
// glslang has no GLSL spelling for that conversion, so we rewrite the compiled
// SPIR-V at module-load time: every `OpLoad %accelStruct <heap-ptr>` becomes a
// load of the TLAS device address from a synthesized push-constant block
// followed by OpConvertUToAccelerationStructureKHR. RTPass pushes the active
// frame's TLAS address into that push constant. Shaders that never touch an
// acceleration structure (no OpTypeAccelerationStructureKHR) are left untouched.
namespace WorkaroundNvidiaAS {
// SPIR-V numeric opcodes / enums used below.
enum : std::uint32_t {
OpEntryPoint = 15, OpCapability = 17,
OpTypeInt = 21, OpTypeStruct = 30, OpTypePointer = 32,
OpConstant = 43, OpVariable = 59, OpLoad = 61, OpAccessChain = 65,
OpDecorate = 71, OpMemberDecorate = 72,
OpConvertUToAccelerationStructureKHR = 4447,
OpTypeAccelerationStructureKHR = 5341,
CapabilityInt64 = 11,
StorageClassPushConstant = 9,
DecorationBlock = 2, DecorationOffset = 35,
};
inline bool IsAnnotation(std::uint32_t op) {
// OpDecorate/OpMemberDecorate/OpDecorationGroup/OpGroupDecorate/
// OpGroupMemberDecorate/OpDecorateId/OpDecorate(Member)String.
return op == 71 || op == 72 || op == 73 || op == 74 || op == 75
|| op == 332 || op == 5632 || op == 5633;
}
using Instr = std::vector<std::uint32_t>;
inline void Patch(std::vector<std::uint32_t>& words) {
if (words.size() < 5) return; // not a SPIR-V module we understand.
// Split header (5 words) from the instruction stream.
std::uint32_t bound = words[3];
std::vector<Instr> instrs;
for (std::size_t i = 5; i < words.size();) {
std::uint32_t len = words[i] >> 16;
if (len == 0 || i + len > words.size()) return; // malformed — bail.
instrs.emplace_back(words.begin() + i, words.begin() + i + len);
i += len;
}
// ── Scan for the AS type, reusable int/long types+constants, and the
// section boundaries we need to insert into.
std::uint32_t asTypeId = 0, ulongTypeId = 0, uintTypeId = 0, uintZeroId = 0;
std::size_t lastCapIdx = 0, lastAnnotIdx = 0, firstFuncIdx = instrs.size();
std::size_t entryIdx = instrs.size();
for (std::size_t k = 0; k < instrs.size(); ++k) {
std::uint32_t op = instrs[k][0] & 0xFFFFu;
switch (op) {
case OpTypeAccelerationStructureKHR: asTypeId = instrs[k][1]; break;
case OpTypeInt:
if (instrs[k][2] == 64 && instrs[k][3] == 0) ulongTypeId = instrs[k][1];
else if (instrs[k][2] == 32 && instrs[k][3] == 0) uintTypeId = instrs[k][1];
break;
case OpConstant:
if (uintTypeId && instrs[k][1] == uintTypeId && instrs[k][3] == 0)
uintZeroId = instrs[k][2];
break;
case OpCapability: lastCapIdx = k; break;
case OpEntryPoint: if (entryIdx == instrs.size()) entryIdx = k; break;
default: break;
}
if (IsAnnotation(op)) lastAnnotIdx = k;
if (op == 54 /*OpFunction*/ && firstFuncIdx == instrs.size()) firstFuncIdx = k;
}
if (asTypeId == 0) return; // shader never reads an acceleration structure.
auto newId = [&] { return bound++; };
auto mk = [](std::initializer_list<std::uint32_t> ops) {
Instr in(ops);
in[0] = static_cast<std::uint32_t>(in.size() << 16) | (in[0] & 0xFFFFu);
return in;
};
// ── Synthesize the types/constants/push-constant we need, reusing any
// the module already defines (SPIR-V forbids duplicate type defs).
std::vector<Instr> typeDefs;
if (uintTypeId == 0) {
uintTypeId = newId();
typeDefs.push_back(mk({OpTypeInt, uintTypeId, 32, 0}));
}
if (uintZeroId == 0) {
uintZeroId = newId();
typeDefs.push_back(mk({OpConstant, uintTypeId, uintZeroId, 0}));
}
if (ulongTypeId == 0) {
ulongTypeId = newId();
typeDefs.push_back(mk({OpTypeInt, ulongTypeId, 64, 0}));
}
std::uint32_t pcStructId = newId();
std::uint32_t ptrPushStructId = newId();
std::uint32_t ptrPushUlongId = newId();
std::uint32_t pcVarId = newId();
typeDefs.push_back(mk({OpTypeStruct, pcStructId, ulongTypeId}));
typeDefs.push_back(mk({OpTypePointer, ptrPushStructId, StorageClassPushConstant, pcStructId}));
typeDefs.push_back(mk({OpTypePointer, ptrPushUlongId, StorageClassPushConstant, ulongTypeId}));
typeDefs.push_back(mk({OpVariable, ptrPushStructId, pcVarId, StorageClassPushConstant}));
std::vector<Instr> decorations = {
mk({OpMemberDecorate, pcStructId, 0, DecorationOffset, 0}),
mk({OpDecorate, pcStructId, DecorationBlock}),
};
// ── Rewrite each `OpLoad %asType <ptr>` into address-load + convert.
std::vector<Instr> rebuilt;
rebuilt.reserve(instrs.size() + 8);
for (const Instr& in : instrs) {
std::uint32_t op = in[0] & 0xFFFFu;
if (op == OpLoad && in[1] == asTypeId) {
std::uint32_t resultId = in[2];
std::uint32_t chainId = newId();
std::uint32_t addrId = newId();
rebuilt.push_back(mk({OpAccessChain, ptrPushUlongId, chainId, pcVarId, uintZeroId}));
rebuilt.push_back(mk({OpLoad, ulongTypeId, addrId, chainId}));
rebuilt.push_back(mk({OpConvertUToAccelerationStructureKHR, asTypeId, resultId, addrId}));
} else {
rebuilt.push_back(in);
}
}
instrs.swap(rebuilt);
// Recompute structural anchors (the rewrite above shifted indices).
lastCapIdx = 0; lastAnnotIdx = 0; firstFuncIdx = instrs.size(); entryIdx = instrs.size();
for (std::size_t k = 0; k < instrs.size(); ++k) {
std::uint32_t op = instrs[k][0] & 0xFFFFu;
if (op == OpCapability) lastCapIdx = k;
if (op == OpEntryPoint && entryIdx == instrs.size()) entryIdx = k;
if (IsAnnotation(op)) lastAnnotIdx = k;
if (op == 54 && firstFuncIdx == instrs.size()) firstFuncIdx = k;
}
// Append the push-constant variable to the entry point's interface
// list (required for SPIR-V ≥ 1.4 — both raygen modules are 1.4).
if (entryIdx != instrs.size() && words[1] >= 0x00010400u) {
instrs[entryIdx].push_back(pcVarId);
instrs[entryIdx][0] = static_cast<std::uint32_t>(instrs[entryIdx].size() << 16)
| OpEntryPoint;
}
// Insert highest-index-first so earlier anchors stay valid.
instrs.insert(instrs.begin() + firstFuncIdx, typeDefs.begin(), typeDefs.end());
instrs.insert(instrs.begin() + lastAnnotIdx + 1, decorations.begin(), decorations.end());
instrs.insert(instrs.begin() + lastCapIdx + 1, mk({OpCapability, CapabilityInt64}));
// ── Reassemble: header (with updated bound) + instruction stream.
std::vector<std::uint32_t> out(words.begin(), words.begin() + 5);
out[3] = bound;
for (const Instr& in : instrs) out.insert(out.end(), in.begin(), in.end());
words.swap(out);
}
}
// ─── END NVIDIA descriptor-heap AS-read workaround ────────────────────────
export namespace Crafter { export namespace Crafter {
class VulkanShader { class VulkanShader {
public: public:
@ -54,7 +222,15 @@ export namespace Crafter {
} }
file.close(); file.close();
// NVIDIA descriptor-heap AS-read workaround (issue #15 / #7).
// No-op on every other driver and on shaders that don't read an
// acceleration structure. Remove with the rest of the workaround
// once a fixed NVIDIA driver ships.
if (Device::workaroundDescriptorHeapAS) {
WorkaroundNvidiaAS::Patch(spirv);
}
VkShaderModuleCreateInfo module_info{VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO}; VkShaderModuleCreateInfo module_info{VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO};
module_info.codeSize = spirv.size() * sizeof(uint32_t); module_info.codeSize = spirv.size() * sizeof(uint32_t);
module_info.pCode = spirv.data(); module_info.pCode = spirv.data();