diff --git a/README.md b/README.md index ac40143..e3dee94 100644 --- a/README.md +++ b/README.md @@ -29,11 +29,17 @@ geometry, closest-hit / miss / any-hit / intersection shaders — see shaded through an intersection shader with an any-hit cut-out. > **Native RT status:** reading an acceleration structure through -> `VK_EXT_descriptor_heap` currently aborts with `VK_ERROR_DEVICE_LOST` on -> NVIDIA driver `610.43.02` — a driver-side fault in the brand-new -> descriptor-heap acceleration-structure path, not an engine bug. The -> engine setup (build, descriptors, SBT) is correct and validation-clean, -> and images/buffers through the same heap work. See +> `VK_EXT_descriptor_heap` aborts with `VK_ERROR_DEVICE_LOST` on NVIDIA +> driver `610.43.02` — a driver-side fault in the brand-new descriptor-heap +> acceleration-structure path, not an engine bug (the setup is correct and +> validation-clean; images/buffers through the same heap work). The engine +> **works around it transparently** (issue #15): on the NVIDIA driver only, +> `VulkanShader` rewrites the compiled SPIR-V so heap AS reads become a +> TLAS-device-address + `OpConvertUToAccelerationStructureKHR` path (which +> reads no descriptor), and `RTPass` supplies the address as push data. +> Shaders and example code are unchanged, and it's a single fenced block +> gated on `Device::workaroundDescriptorHeapAS`, removable once a fixed +> driver ships. See > [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md) > for the full investigation. WebGPU RT is unaffected. diff --git a/examples/VulkanTriangle/README.md b/examples/VulkanTriangle/README.md index 2c0ee64..5ff2ab5 100644 --- a/examples/VulkanTriangle/README.md +++ b/examples/VulkanTriangle/README.md @@ -28,22 +28,36 @@ cd examples/VulkanTriangle crafter-build -r ``` -On a working driver you should see a 1280×720 window with a triangle -filling roughly the centre. **On the current NVIDIA driver the native -build aborts with `VK_ERROR_DEVICE_LOST` the moment `traceRayEXT` runs — -see below.** +You should see a 1280×720 window with an RGB-barycentric triangle filling +roughly the centre. On the NVIDIA driver this works through an engine-side +workaround for a driver fault — see below. -## Native status — known driver fault (`VK_ERROR_DEVICE_LOST`) +## Native status — NVIDIA driver fault, worked around -On NVIDIA driver `610.43.02` (Vulkan 1.4) the native build aborts with -`VK_ERROR_DEVICE_LOST` on the first frame as soon as the shader reads the -acceleration structure. `VK_EXT_device_fault` reports an invalid GPU read -(address `~0xffff…`) plus instruction-pointer faults inside the -ray-tracing shader. Commenting out the `traceRayEXT` call makes the crash -disappear (the dispatch + `imageStore` path renders a solid colour fine). +On NVIDIA driver `610.43.02` (Vulkan 1.4) reading the acceleration +structure through `VK_EXT_descriptor_heap` aborts the device with +`VK_ERROR_DEVICE_LOST` on the first frame. This is a **driver-side fault** +in the brand-new descriptor-heap acceleration-structure path, not an engine +bug (full investigation in #7, summarised below). -This was investigated thoroughly and traced to the **acceleration-structure -read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup: +**The engine works around it transparently** (issue #15). On the NVIDIA +proprietary driver only, `VulkanShader` rewrites the compiled SPIR-V at +module-load time so that every `OpLoad` of an `accelerationStructureEXT` +out of the heap becomes a load of the TLAS *device address* (from a +synthesized push-constant block) followed by +`OpConvertUToAccelerationStructureKHR` — which reads no descriptor and so +never touches the faulting path. `RTPass` feeds the active frame's TLAS +address in as push data. `raygen.glsl` and the example code are unchanged; +acceleration structures still bind into the heap normally. On every other +driver the workaround is inert. It's gated on +`Device::workaroundDescriptorHeapAS` and confined to one fenced block in +`interfaces/Crafter.Graphics-ShaderVulkan.cppm` so it can be deleted wholesale +once a fixed NVIDIA driver ships. + +### The underlying fault (#7) + +The fault was traced to the **acceleration-structure read through +`VK_EXT_descriptor_heap`**, *not* to the engine's RT setup: - The BLAS/TLAS build is correct and finishes before rendering (`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance @@ -70,7 +84,7 @@ read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup: second conformant implementation to cross-check against. **Conclusion:** this is a driver-side fault in NVIDIA's -`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug. It -should be reported to NVIDIA. The `traceRayEXT` call is intentionally left -in `raygen.glsl` so this stays a faithful one-file reproducer; the example -will start rendering the triangle again once a fixed driver ships. +`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug, and +it should be reported to NVIDIA. Until a fixed driver ships, the SPIR-V +rewrite above keeps the native RT path working; once it does, remove the +workaround and the heap AS read becomes the direct path again. diff --git a/examples/VulkanTriangle/main.cpp b/examples/VulkanTriangle/main.cpp index 29c3e8f..deaa699 100644 --- a/examples/VulkanTriangle/main.cpp +++ b/examples/VulkanTriangle/main.cpp @@ -201,12 +201,13 @@ int main() { RTPass rtPass(&pipeline); window.passes.push_back(&rtPass); - // NOTE: on NVIDIA 610.43.02 this aborts with VK_ERROR_DEVICE_LOST the - // first time the raygen shader reads the acceleration structure out of - // the VK_EXT_descriptor_heap. The build, descriptors and SBT are all - // correct and validation-clean; it is a driver-side fault in the - // descriptor-heap acceleration-structure path. See README.md - // ("Native status — known driver fault") for the full investigation. + // NOTE: reading the acceleration structure through VK_EXT_descriptor_heap + // aborts with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 (a driver fault — + // see #7). The engine transparently works around it: on the NVIDIA driver + // VulkanShader rewrites the heap AS read into a TLAS-device-address + + // OpConvertUToAccelerationStructureKHR path and RTPass feeds the address in + // as push data. Nothing here (or in raygen.glsl) changes. See README.md + // ("Native status") and interfaces/Crafter.Graphics-ShaderVulkan.cppm. window.Render(); window.StartSync(); } diff --git a/implementations/Crafter.Graphics-Device.cpp b/implementations/Crafter.Graphics-Device.cpp index 5437bf2..770d50c 100644 --- a/implementations/Crafter.Graphics-Device.cpp +++ b/implementations/Crafter.Graphics-Device.cpp @@ -566,12 +566,22 @@ void Device::Initialize() { memoryDecompressionProperties.pNext = const_cast(rayTracingProperties.pNext); rayTracingProperties.pNext = &memoryDecompressionProperties; } + // Chain driver properties onto the tail of the query so we can detect + // the NVIDIA proprietary driver for the descriptor-heap AS-read + // workaround (issue #15 / #7). + descriptorHeapProperties.pNext = &driverProperties; VkPhysicalDeviceProperties2 properties2 { .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2, .pNext = &rayTracingProperties }; vkGetPhysicalDeviceProperties2(physDevice, &properties2); + // NVIDIA's brand-new VK_EXT_descriptor_heap acceleration-structure read + // path faults (see #7); enable the SPIR-V rewrite workaround there. Other + // drivers (and any future fixed NVIDIA driver, once this gate is removed) + // take the normal heap-bound AS path unchanged. + workaroundDescriptorHeapAS = (driverProperties.driverID == VK_DRIVER_ID_NVIDIA_PROPRIETARY); + // Sanity-gate: GDeflate 1.0 must actually be in the supported method set. if (memoryDecompressionSupported && (memoryDecompressionProperties.decompressionMethods & VK_MEMORY_DECOMPRESSION_METHOD_GDEFLATE_1_0_BIT_EXT) == 0) { @@ -699,6 +709,11 @@ void Device::Initialize() { .shaderSampledImageArrayDynamicIndexing = VK_TRUE, .shaderStorageBufferArrayDynamicIndexing = VK_TRUE, .shaderStorageImageArrayDynamicIndexing = VK_TRUE, + // shaderInt64: only needed for the NVIDIA descriptor-heap AS-read + // workaround (issue #15 / #7), which loads the TLAS device address + // as a 64-bit push constant. Gated so it isn't required on drivers + // that don't take the workaround path. Remove with the workaround. + .shaderInt64 = workaroundDescriptorHeapAS ? VK_TRUE : VK_FALSE, .shaderInt16 = VK_TRUE } }; diff --git a/interfaces/Crafter.Graphics-Device.cppm b/interfaces/Crafter.Graphics-Device.cppm index 7238ba8..679a40c 100644 --- a/interfaces/Crafter.Graphics-Device.cppm +++ b/interfaces/Crafter.Graphics-Device.cppm @@ -165,6 +165,19 @@ export namespace Crafter { inline static VkPhysicalDeviceMemoryDecompressionPropertiesEXT memoryDecompressionProperties = { .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MEMORY_DECOMPRESSION_PROPERTIES_EXT }; + inline static VkPhysicalDeviceDriverProperties driverProperties = { + .sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DRIVER_PROPERTIES + }; + + // ─── NVIDIA descriptor-heap AS-read workaround (issue #15 / #7) ── + // True only on the NVIDIA proprietary driver, where reading an + // acceleration structure through VK_EXT_descriptor_heap aborts with + // VK_ERROR_DEVICE_LOST (a brand-new-extension driver fault, verified + // engine-clean in #7). When set, VulkanShader rewrites heap AS reads + // into a TLAS-device-address + OpConvertUToAccelerationStructureKHR + // path and RTPass pushes the active TLAS address as push data. Delete + // this flag and everything keyed on it once a fixed driver ships. + inline static bool workaroundDescriptorHeapAS = false; static void CheckVkResult(VkResult result); static std::uint32_t GetMemoryType(std::uint32_t typeBits, VkMemoryPropertyFlags properties); diff --git a/interfaces/Crafter.Graphics-RTPass.cppm b/interfaces/Crafter.Graphics-RTPass.cppm index 0da7088..65907d6 100644 --- a/interfaces/Crafter.Graphics-RTPass.cppm +++ b/interfaces/Crafter.Graphics-RTPass.cppm @@ -27,6 +27,7 @@ import :RenderPass; import :Window; import :Device; import :PipelineRTVulkan; +import :RenderingElement3D; export namespace Crafter { struct RTPass : RenderPass { @@ -34,8 +35,22 @@ export namespace Crafter { RTPass(PipelineRTVulkan* p) : pipeline(p) {} - void Record(VkCommandBuffer cmd, std::uint32_t /*frameIdx*/, Window& window) override { + void Record(VkCommandBuffer cmd, std::uint32_t frameIdx, Window& window) override { vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, pipeline->pipeline); + // NVIDIA descriptor-heap AS-read workaround (issue #15 / #7): feed + // the active frame's TLAS device address into the push-constant + // block that VulkanShader synthesizes, so the rewritten raygen can + // reach the acceleration structure by address instead of through + // the faulting heap descriptor. Inert on every other driver. + if (Device::workaroundDescriptorHeapAS) { + VkDeviceAddress tlasAddr = RenderingElement3D::tlases[frameIdx].address; + VkPushDataInfoEXT pushInfo { + .sType = VK_STRUCTURE_TYPE_PUSH_DATA_INFO_EXT, + .offset = 0, + .data = { .address = &tlasAddr, .size = sizeof(tlasAddr) }, + }; + Device::vkCmdPushDataEXT(cmd, &pushInfo); + } Device::vkCmdTraceRaysKHR(cmd, &pipeline->raygenRegion, &pipeline->missRegion, diff --git a/interfaces/Crafter.Graphics-ShaderVulkan.cppm b/interfaces/Crafter.Graphics-ShaderVulkan.cppm index 3edba08..7bff83f 100644 --- a/interfaces/Crafter.Graphics-ShaderVulkan.cppm +++ b/interfaces/Crafter.Graphics-ShaderVulkan.cppm @@ -27,6 +27,174 @@ import std; import :Device; import :Types; +// ─── BEGIN NVIDIA descriptor-heap AS-read workaround (issue #15 / #7) ───── +// Remove this whole block (and its call below, Device::workaroundDescriptorHeapAS, +// and the RTPass push-data) once NVIDIA ships a driver that fixes the +// VK_EXT_descriptor_heap acceleration-structure read fault. +// +// On the affected driver, reading an `accelerationStructureEXT` out of the +// descriptor heap aborts the device. The build, the heap descriptor write and +// everything else are correct (proven in #7); only the in-shader heap AS read +// is broken — buffers/images through the same heap work. Acceleration +// structures can equally be addressed by their device address, and +// OpConvertUToAccelerationStructureKHR (which reads no descriptor) sidesteps +// the faulting path entirely. +// +// glslang has no GLSL spelling for that conversion, so we rewrite the compiled +// SPIR-V at module-load time: every `OpLoad %accelStruct ` becomes a +// load of the TLAS device address from a synthesized push-constant block +// followed by OpConvertUToAccelerationStructureKHR. RTPass pushes the active +// frame's TLAS address into that push constant. Shaders that never touch an +// acceleration structure (no OpTypeAccelerationStructureKHR) are left untouched. +namespace WorkaroundNvidiaAS { + // SPIR-V numeric opcodes / enums used below. + enum : std::uint32_t { + OpEntryPoint = 15, OpCapability = 17, + OpTypeInt = 21, OpTypeStruct = 30, OpTypePointer = 32, + OpConstant = 43, OpVariable = 59, OpLoad = 61, OpAccessChain = 65, + OpDecorate = 71, OpMemberDecorate = 72, + OpConvertUToAccelerationStructureKHR = 4447, + OpTypeAccelerationStructureKHR = 5341, + CapabilityInt64 = 11, + StorageClassPushConstant = 9, + DecorationBlock = 2, DecorationOffset = 35, + }; + + inline bool IsAnnotation(std::uint32_t op) { + // OpDecorate/OpMemberDecorate/OpDecorationGroup/OpGroupDecorate/ + // OpGroupMemberDecorate/OpDecorateId/OpDecorate(Member)String. + return op == 71 || op == 72 || op == 73 || op == 74 || op == 75 + || op == 332 || op == 5632 || op == 5633; + } + + using Instr = std::vector; + + inline void Patch(std::vector& words) { + if (words.size() < 5) return; // not a SPIR-V module we understand. + + // Split header (5 words) from the instruction stream. + std::uint32_t bound = words[3]; + std::vector instrs; + for (std::size_t i = 5; i < words.size();) { + std::uint32_t len = words[i] >> 16; + if (len == 0 || i + len > words.size()) return; // malformed — bail. + instrs.emplace_back(words.begin() + i, words.begin() + i + len); + i += len; + } + + // ── Scan for the AS type, reusable int/long types+constants, and the + // section boundaries we need to insert into. + std::uint32_t asTypeId = 0, ulongTypeId = 0, uintTypeId = 0, uintZeroId = 0; + std::size_t lastCapIdx = 0, lastAnnotIdx = 0, firstFuncIdx = instrs.size(); + std::size_t entryIdx = instrs.size(); + for (std::size_t k = 0; k < instrs.size(); ++k) { + std::uint32_t op = instrs[k][0] & 0xFFFFu; + switch (op) { + case OpTypeAccelerationStructureKHR: asTypeId = instrs[k][1]; break; + case OpTypeInt: + if (instrs[k][2] == 64 && instrs[k][3] == 0) ulongTypeId = instrs[k][1]; + else if (instrs[k][2] == 32 && instrs[k][3] == 0) uintTypeId = instrs[k][1]; + break; + case OpConstant: + if (uintTypeId && instrs[k][1] == uintTypeId && instrs[k][3] == 0) + uintZeroId = instrs[k][2]; + break; + case OpCapability: lastCapIdx = k; break; + case OpEntryPoint: if (entryIdx == instrs.size()) entryIdx = k; break; + default: break; + } + if (IsAnnotation(op)) lastAnnotIdx = k; + if (op == 54 /*OpFunction*/ && firstFuncIdx == instrs.size()) firstFuncIdx = k; + } + + if (asTypeId == 0) return; // shader never reads an acceleration structure. + + auto newId = [&] { return bound++; }; + auto mk = [](std::initializer_list ops) { + Instr in(ops); + in[0] = static_cast(in.size() << 16) | (in[0] & 0xFFFFu); + return in; + }; + + // ── Synthesize the types/constants/push-constant we need, reusing any + // the module already defines (SPIR-V forbids duplicate type defs). + std::vector typeDefs; + if (uintTypeId == 0) { + uintTypeId = newId(); + typeDefs.push_back(mk({OpTypeInt, uintTypeId, 32, 0})); + } + if (uintZeroId == 0) { + uintZeroId = newId(); + typeDefs.push_back(mk({OpConstant, uintTypeId, uintZeroId, 0})); + } + if (ulongTypeId == 0) { + ulongTypeId = newId(); + typeDefs.push_back(mk({OpTypeInt, ulongTypeId, 64, 0})); + } + std::uint32_t pcStructId = newId(); + std::uint32_t ptrPushStructId = newId(); + std::uint32_t ptrPushUlongId = newId(); + std::uint32_t pcVarId = newId(); + typeDefs.push_back(mk({OpTypeStruct, pcStructId, ulongTypeId})); + typeDefs.push_back(mk({OpTypePointer, ptrPushStructId, StorageClassPushConstant, pcStructId})); + typeDefs.push_back(mk({OpTypePointer, ptrPushUlongId, StorageClassPushConstant, ulongTypeId})); + typeDefs.push_back(mk({OpVariable, ptrPushStructId, pcVarId, StorageClassPushConstant})); + + std::vector decorations = { + mk({OpMemberDecorate, pcStructId, 0, DecorationOffset, 0}), + mk({OpDecorate, pcStructId, DecorationBlock}), + }; + + // ── Rewrite each `OpLoad %asType ` into address-load + convert. + std::vector rebuilt; + rebuilt.reserve(instrs.size() + 8); + for (const Instr& in : instrs) { + std::uint32_t op = in[0] & 0xFFFFu; + if (op == OpLoad && in[1] == asTypeId) { + std::uint32_t resultId = in[2]; + std::uint32_t chainId = newId(); + std::uint32_t addrId = newId(); + rebuilt.push_back(mk({OpAccessChain, ptrPushUlongId, chainId, pcVarId, uintZeroId})); + rebuilt.push_back(mk({OpLoad, ulongTypeId, addrId, chainId})); + rebuilt.push_back(mk({OpConvertUToAccelerationStructureKHR, asTypeId, resultId, addrId})); + } else { + rebuilt.push_back(in); + } + } + instrs.swap(rebuilt); + + // Recompute structural anchors (the rewrite above shifted indices). + lastCapIdx = 0; lastAnnotIdx = 0; firstFuncIdx = instrs.size(); entryIdx = instrs.size(); + for (std::size_t k = 0; k < instrs.size(); ++k) { + std::uint32_t op = instrs[k][0] & 0xFFFFu; + if (op == OpCapability) lastCapIdx = k; + if (op == OpEntryPoint && entryIdx == instrs.size()) entryIdx = k; + if (IsAnnotation(op)) lastAnnotIdx = k; + if (op == 54 && firstFuncIdx == instrs.size()) firstFuncIdx = k; + } + + // Append the push-constant variable to the entry point's interface + // list (required for SPIR-V ≥ 1.4 — both raygen modules are 1.4). + if (entryIdx != instrs.size() && words[1] >= 0x00010400u) { + instrs[entryIdx].push_back(pcVarId); + instrs[entryIdx][0] = static_cast(instrs[entryIdx].size() << 16) + | OpEntryPoint; + } + + // Insert highest-index-first so earlier anchors stay valid. + instrs.insert(instrs.begin() + firstFuncIdx, typeDefs.begin(), typeDefs.end()); + instrs.insert(instrs.begin() + lastAnnotIdx + 1, decorations.begin(), decorations.end()); + instrs.insert(instrs.begin() + lastCapIdx + 1, mk({OpCapability, CapabilityInt64})); + + // ── Reassemble: header (with updated bound) + instruction stream. + std::vector out(words.begin(), words.begin() + 5); + out[3] = bound; + for (const Instr& in : instrs) out.insert(out.end(), in.begin(), in.end()); + words.swap(out); + } +} +// ─── END NVIDIA descriptor-heap AS-read workaround ──────────────────────── + export namespace Crafter { class VulkanShader { public: @@ -54,7 +222,15 @@ export namespace Crafter { } file.close(); - + + // NVIDIA descriptor-heap AS-read workaround (issue #15 / #7). + // No-op on every other driver and on shaders that don't read an + // acceleration structure. Remove with the rest of the workaround + // once a fixed NVIDIA driver ships. + if (Device::workaroundDescriptorHeapAS) { + WorkaroundNvidiaAS::Patch(spirv); + } + VkShaderModuleCreateInfo module_info{VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO}; module_info.codeSize = spirv.size() * sizeof(uint32_t); module_info.pCode = spirv.data();