fix(vulkan-rt): work around NVIDIA descriptor-heap AS-read device-loss (#15)
Reading an acceleration structure through VK_EXT_descriptor_heap aborts with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 — a brand-new-extension driver fault isolated in #7 (engine setup is correct and validation-clean; images/buffers through the same heap work, and both traceRayEXT and inline rayQuery fault identically on the AS read). An acceleration structure can equally be reached by its device address via OpConvertUToAccelerationStructureKHR, which reads no descriptor and so never touches the faulting heap path. glslang has no GLSL spelling for that conversion, so VulkanShader rewrites the compiled SPIR-V at module-load time: every `OpLoad %accelStruct <heap-ptr>` becomes a load of the TLAS device address from a synthesized push-constant block followed by the convert. RTPass pushes the active frame's TLAS address into that push constant. User GLSL and example code are unchanged; acceleration structures still bind into the heap normally. The workaround is gated on Device::workaroundDescriptorHeapAS (true only on the NVIDIA proprietary driver) and confined to one fenced block in Crafter.Graphics-ShaderVulkan.cppm plus the RTPass push and the shaderInt64 feature toggle — delete those once a fixed NVIDIA driver ships and the heap AS read becomes the direct path again. Verified: VulkanTriangle ray-traces correctly on native NVIDIA (RTX 4090), validation-layer-clean, no device loss. The SPIR-V rewrite was independently validated with spirv-val on both the VulkanTriangle and Sponza raygen modules. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
b9f65f5273
commit
950059c86e
7 changed files with 270 additions and 30 deletions
16
README.md
16
README.md
|
|
@ -29,11 +29,17 @@ geometry, closest-hit / miss / any-hit / intersection shaders — see
|
||||||
shaded through an intersection shader with an any-hit cut-out.
|
shaded through an intersection shader with an any-hit cut-out.
|
||||||
|
|
||||||
> **Native RT status:** reading an acceleration structure through
|
> **Native RT status:** reading an acceleration structure through
|
||||||
> `VK_EXT_descriptor_heap` currently aborts with `VK_ERROR_DEVICE_LOST` on
|
> `VK_EXT_descriptor_heap` aborts with `VK_ERROR_DEVICE_LOST` on NVIDIA
|
||||||
> NVIDIA driver `610.43.02` — a driver-side fault in the brand-new
|
> driver `610.43.02` — a driver-side fault in the brand-new descriptor-heap
|
||||||
> descriptor-heap acceleration-structure path, not an engine bug. The
|
> acceleration-structure path, not an engine bug (the setup is correct and
|
||||||
> engine setup (build, descriptors, SBT) is correct and validation-clean,
|
> validation-clean; images/buffers through the same heap work). The engine
|
||||||
> and images/buffers through the same heap work. See
|
> **works around it transparently** (issue #15): on the NVIDIA driver only,
|
||||||
|
> `VulkanShader` rewrites the compiled SPIR-V so heap AS reads become a
|
||||||
|
> TLAS-device-address + `OpConvertUToAccelerationStructureKHR` path (which
|
||||||
|
> reads no descriptor), and `RTPass` supplies the address as push data.
|
||||||
|
> Shaders and example code are unchanged, and it's a single fenced block
|
||||||
|
> gated on `Device::workaroundDescriptorHeapAS`, removable once a fixed
|
||||||
|
> driver ships. See
|
||||||
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
|
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
|
||||||
> for the full investigation. WebGPU RT is unaffected.
|
> for the full investigation. WebGPU RT is unaffected.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -28,22 +28,36 @@ cd examples/VulkanTriangle
|
||||||
crafter-build -r
|
crafter-build -r
|
||||||
```
|
```
|
||||||
|
|
||||||
On a working driver you should see a 1280×720 window with a triangle
|
You should see a 1280×720 window with an RGB-barycentric triangle filling
|
||||||
filling roughly the centre. **On the current NVIDIA driver the native
|
roughly the centre. On the NVIDIA driver this works through an engine-side
|
||||||
build aborts with `VK_ERROR_DEVICE_LOST` the moment `traceRayEXT` runs —
|
workaround for a driver fault — see below.
|
||||||
see below.**
|
|
||||||
|
|
||||||
## Native status — known driver fault (`VK_ERROR_DEVICE_LOST`)
|
## Native status — NVIDIA driver fault, worked around
|
||||||
|
|
||||||
On NVIDIA driver `610.43.02` (Vulkan 1.4) the native build aborts with
|
On NVIDIA driver `610.43.02` (Vulkan 1.4) reading the acceleration
|
||||||
`VK_ERROR_DEVICE_LOST` on the first frame as soon as the shader reads the
|
structure through `VK_EXT_descriptor_heap` aborts the device with
|
||||||
acceleration structure. `VK_EXT_device_fault` reports an invalid GPU read
|
`VK_ERROR_DEVICE_LOST` on the first frame. This is a **driver-side fault**
|
||||||
(address `~0xffff…`) plus instruction-pointer faults inside the
|
in the brand-new descriptor-heap acceleration-structure path, not an engine
|
||||||
ray-tracing shader. Commenting out the `traceRayEXT` call makes the crash
|
bug (full investigation in #7, summarised below).
|
||||||
disappear (the dispatch + `imageStore` path renders a solid colour fine).
|
|
||||||
|
|
||||||
This was investigated thoroughly and traced to the **acceleration-structure
|
**The engine works around it transparently** (issue #15). On the NVIDIA
|
||||||
read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup:
|
proprietary driver only, `VulkanShader` rewrites the compiled SPIR-V at
|
||||||
|
module-load time so that every `OpLoad` of an `accelerationStructureEXT`
|
||||||
|
out of the heap becomes a load of the TLAS *device address* (from a
|
||||||
|
synthesized push-constant block) followed by
|
||||||
|
`OpConvertUToAccelerationStructureKHR` — which reads no descriptor and so
|
||||||
|
never touches the faulting path. `RTPass` feeds the active frame's TLAS
|
||||||
|
address in as push data. `raygen.glsl` and the example code are unchanged;
|
||||||
|
acceleration structures still bind into the heap normally. On every other
|
||||||
|
driver the workaround is inert. It's gated on
|
||||||
|
`Device::workaroundDescriptorHeapAS` and confined to one fenced block in
|
||||||
|
`interfaces/Crafter.Graphics-ShaderVulkan.cppm` so it can be deleted wholesale
|
||||||
|
once a fixed NVIDIA driver ships.
|
||||||
|
|
||||||
|
### The underlying fault (#7)
|
||||||
|
|
||||||
|
The fault was traced to the **acceleration-structure read through
|
||||||
|
`VK_EXT_descriptor_heap`**, *not* to the engine's RT setup:
|
||||||
|
|
||||||
- The BLAS/TLAS build is correct and finishes before rendering
|
- The BLAS/TLAS build is correct and finishes before rendering
|
||||||
(`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance
|
(`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance
|
||||||
|
|
@ -70,7 +84,7 @@ read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup:
|
||||||
second conformant implementation to cross-check against.
|
second conformant implementation to cross-check against.
|
||||||
|
|
||||||
**Conclusion:** this is a driver-side fault in NVIDIA's
|
**Conclusion:** this is a driver-side fault in NVIDIA's
|
||||||
`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug. It
|
`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug, and
|
||||||
should be reported to NVIDIA. The `traceRayEXT` call is intentionally left
|
it should be reported to NVIDIA. Until a fixed driver ships, the SPIR-V
|
||||||
in `raygen.glsl` so this stays a faithful one-file reproducer; the example
|
rewrite above keeps the native RT path working; once it does, remove the
|
||||||
will start rendering the triangle again once a fixed driver ships.
|
workaround and the heap AS read becomes the direct path again.
|
||||||
|
|
|
||||||
|
|
@ -201,12 +201,13 @@ int main() {
|
||||||
RTPass rtPass(&pipeline);
|
RTPass rtPass(&pipeline);
|
||||||
window.passes.push_back(&rtPass);
|
window.passes.push_back(&rtPass);
|
||||||
|
|
||||||
// NOTE: on NVIDIA 610.43.02 this aborts with VK_ERROR_DEVICE_LOST the
|
// NOTE: reading the acceleration structure through VK_EXT_descriptor_heap
|
||||||
// first time the raygen shader reads the acceleration structure out of
|
// aborts with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 (a driver fault —
|
||||||
// the VK_EXT_descriptor_heap. The build, descriptors and SBT are all
|
// see #7). The engine transparently works around it: on the NVIDIA driver
|
||||||
// correct and validation-clean; it is a driver-side fault in the
|
// VulkanShader rewrites the heap AS read into a TLAS-device-address +
|
||||||
// descriptor-heap acceleration-structure path. See README.md
|
// OpConvertUToAccelerationStructureKHR path and RTPass feeds the address in
|
||||||
// ("Native status — known driver fault") for the full investigation.
|
// as push data. Nothing here (or in raygen.glsl) changes. See README.md
|
||||||
|
// ("Native status") and interfaces/Crafter.Graphics-ShaderVulkan.cppm.
|
||||||
window.Render();
|
window.Render();
|
||||||
window.StartSync();
|
window.StartSync();
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -566,12 +566,22 @@ void Device::Initialize() {
|
||||||
memoryDecompressionProperties.pNext = const_cast<void*>(rayTracingProperties.pNext);
|
memoryDecompressionProperties.pNext = const_cast<void*>(rayTracingProperties.pNext);
|
||||||
rayTracingProperties.pNext = &memoryDecompressionProperties;
|
rayTracingProperties.pNext = &memoryDecompressionProperties;
|
||||||
}
|
}
|
||||||
|
// Chain driver properties onto the tail of the query so we can detect
|
||||||
|
// the NVIDIA proprietary driver for the descriptor-heap AS-read
|
||||||
|
// workaround (issue #15 / #7).
|
||||||
|
descriptorHeapProperties.pNext = &driverProperties;
|
||||||
VkPhysicalDeviceProperties2 properties2 {
|
VkPhysicalDeviceProperties2 properties2 {
|
||||||
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2,
|
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2,
|
||||||
.pNext = &rayTracingProperties
|
.pNext = &rayTracingProperties
|
||||||
};
|
};
|
||||||
vkGetPhysicalDeviceProperties2(physDevice, &properties2);
|
vkGetPhysicalDeviceProperties2(physDevice, &properties2);
|
||||||
|
|
||||||
|
// NVIDIA's brand-new VK_EXT_descriptor_heap acceleration-structure read
|
||||||
|
// path faults (see #7); enable the SPIR-V rewrite workaround there. Other
|
||||||
|
// drivers (and any future fixed NVIDIA driver, once this gate is removed)
|
||||||
|
// take the normal heap-bound AS path unchanged.
|
||||||
|
workaroundDescriptorHeapAS = (driverProperties.driverID == VK_DRIVER_ID_NVIDIA_PROPRIETARY);
|
||||||
|
|
||||||
// Sanity-gate: GDeflate 1.0 must actually be in the supported method set.
|
// Sanity-gate: GDeflate 1.0 must actually be in the supported method set.
|
||||||
if (memoryDecompressionSupported &&
|
if (memoryDecompressionSupported &&
|
||||||
(memoryDecompressionProperties.decompressionMethods & VK_MEMORY_DECOMPRESSION_METHOD_GDEFLATE_1_0_BIT_EXT) == 0) {
|
(memoryDecompressionProperties.decompressionMethods & VK_MEMORY_DECOMPRESSION_METHOD_GDEFLATE_1_0_BIT_EXT) == 0) {
|
||||||
|
|
@ -699,6 +709,11 @@ void Device::Initialize() {
|
||||||
.shaderSampledImageArrayDynamicIndexing = VK_TRUE,
|
.shaderSampledImageArrayDynamicIndexing = VK_TRUE,
|
||||||
.shaderStorageBufferArrayDynamicIndexing = VK_TRUE,
|
.shaderStorageBufferArrayDynamicIndexing = VK_TRUE,
|
||||||
.shaderStorageImageArrayDynamicIndexing = VK_TRUE,
|
.shaderStorageImageArrayDynamicIndexing = VK_TRUE,
|
||||||
|
// shaderInt64: only needed for the NVIDIA descriptor-heap AS-read
|
||||||
|
// workaround (issue #15 / #7), which loads the TLAS device address
|
||||||
|
// as a 64-bit push constant. Gated so it isn't required on drivers
|
||||||
|
// that don't take the workaround path. Remove with the workaround.
|
||||||
|
.shaderInt64 = workaroundDescriptorHeapAS ? VK_TRUE : VK_FALSE,
|
||||||
.shaderInt16 = VK_TRUE
|
.shaderInt16 = VK_TRUE
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
|
||||||
|
|
@ -165,6 +165,19 @@ export namespace Crafter {
|
||||||
inline static VkPhysicalDeviceMemoryDecompressionPropertiesEXT memoryDecompressionProperties = {
|
inline static VkPhysicalDeviceMemoryDecompressionPropertiesEXT memoryDecompressionProperties = {
|
||||||
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MEMORY_DECOMPRESSION_PROPERTIES_EXT
|
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MEMORY_DECOMPRESSION_PROPERTIES_EXT
|
||||||
};
|
};
|
||||||
|
inline static VkPhysicalDeviceDriverProperties driverProperties = {
|
||||||
|
.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DRIVER_PROPERTIES
|
||||||
|
};
|
||||||
|
|
||||||
|
// ─── NVIDIA descriptor-heap AS-read workaround (issue #15 / #7) ──
|
||||||
|
// True only on the NVIDIA proprietary driver, where reading an
|
||||||
|
// acceleration structure through VK_EXT_descriptor_heap aborts with
|
||||||
|
// VK_ERROR_DEVICE_LOST (a brand-new-extension driver fault, verified
|
||||||
|
// engine-clean in #7). When set, VulkanShader rewrites heap AS reads
|
||||||
|
// into a TLAS-device-address + OpConvertUToAccelerationStructureKHR
|
||||||
|
// path and RTPass pushes the active TLAS address as push data. Delete
|
||||||
|
// this flag and everything keyed on it once a fixed driver ships.
|
||||||
|
inline static bool workaroundDescriptorHeapAS = false;
|
||||||
|
|
||||||
static void CheckVkResult(VkResult result);
|
static void CheckVkResult(VkResult result);
|
||||||
static std::uint32_t GetMemoryType(std::uint32_t typeBits, VkMemoryPropertyFlags properties);
|
static std::uint32_t GetMemoryType(std::uint32_t typeBits, VkMemoryPropertyFlags properties);
|
||||||
|
|
|
||||||
|
|
@ -27,6 +27,7 @@ import :RenderPass;
|
||||||
import :Window;
|
import :Window;
|
||||||
import :Device;
|
import :Device;
|
||||||
import :PipelineRTVulkan;
|
import :PipelineRTVulkan;
|
||||||
|
import :RenderingElement3D;
|
||||||
|
|
||||||
export namespace Crafter {
|
export namespace Crafter {
|
||||||
struct RTPass : RenderPass {
|
struct RTPass : RenderPass {
|
||||||
|
|
@ -34,8 +35,22 @@ export namespace Crafter {
|
||||||
|
|
||||||
RTPass(PipelineRTVulkan* p) : pipeline(p) {}
|
RTPass(PipelineRTVulkan* p) : pipeline(p) {}
|
||||||
|
|
||||||
void Record(VkCommandBuffer cmd, std::uint32_t /*frameIdx*/, Window& window) override {
|
void Record(VkCommandBuffer cmd, std::uint32_t frameIdx, Window& window) override {
|
||||||
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, pipeline->pipeline);
|
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, pipeline->pipeline);
|
||||||
|
// NVIDIA descriptor-heap AS-read workaround (issue #15 / #7): feed
|
||||||
|
// the active frame's TLAS device address into the push-constant
|
||||||
|
// block that VulkanShader synthesizes, so the rewritten raygen can
|
||||||
|
// reach the acceleration structure by address instead of through
|
||||||
|
// the faulting heap descriptor. Inert on every other driver.
|
||||||
|
if (Device::workaroundDescriptorHeapAS) {
|
||||||
|
VkDeviceAddress tlasAddr = RenderingElement3D::tlases[frameIdx].address;
|
||||||
|
VkPushDataInfoEXT pushInfo {
|
||||||
|
.sType = VK_STRUCTURE_TYPE_PUSH_DATA_INFO_EXT,
|
||||||
|
.offset = 0,
|
||||||
|
.data = { .address = &tlasAddr, .size = sizeof(tlasAddr) },
|
||||||
|
};
|
||||||
|
Device::vkCmdPushDataEXT(cmd, &pushInfo);
|
||||||
|
}
|
||||||
Device::vkCmdTraceRaysKHR(cmd,
|
Device::vkCmdTraceRaysKHR(cmd,
|
||||||
&pipeline->raygenRegion,
|
&pipeline->raygenRegion,
|
||||||
&pipeline->missRegion,
|
&pipeline->missRegion,
|
||||||
|
|
|
||||||
|
|
@ -27,6 +27,174 @@ import std;
|
||||||
import :Device;
|
import :Device;
|
||||||
import :Types;
|
import :Types;
|
||||||
|
|
||||||
|
// ─── BEGIN NVIDIA descriptor-heap AS-read workaround (issue #15 / #7) ─────
|
||||||
|
// Remove this whole block (and its call below, Device::workaroundDescriptorHeapAS,
|
||||||
|
// and the RTPass push-data) once NVIDIA ships a driver that fixes the
|
||||||
|
// VK_EXT_descriptor_heap acceleration-structure read fault.
|
||||||
|
//
|
||||||
|
// On the affected driver, reading an `accelerationStructureEXT` out of the
|
||||||
|
// descriptor heap aborts the device. The build, the heap descriptor write and
|
||||||
|
// everything else are correct (proven in #7); only the in-shader heap AS read
|
||||||
|
// is broken — buffers/images through the same heap work. Acceleration
|
||||||
|
// structures can equally be addressed by their device address, and
|
||||||
|
// OpConvertUToAccelerationStructureKHR (which reads no descriptor) sidesteps
|
||||||
|
// the faulting path entirely.
|
||||||
|
//
|
||||||
|
// glslang has no GLSL spelling for that conversion, so we rewrite the compiled
|
||||||
|
// SPIR-V at module-load time: every `OpLoad %accelStruct <heap-ptr>` becomes a
|
||||||
|
// load of the TLAS device address from a synthesized push-constant block
|
||||||
|
// followed by OpConvertUToAccelerationStructureKHR. RTPass pushes the active
|
||||||
|
// frame's TLAS address into that push constant. Shaders that never touch an
|
||||||
|
// acceleration structure (no OpTypeAccelerationStructureKHR) are left untouched.
|
||||||
|
namespace WorkaroundNvidiaAS {
|
||||||
|
// SPIR-V numeric opcodes / enums used below.
|
||||||
|
enum : std::uint32_t {
|
||||||
|
OpEntryPoint = 15, OpCapability = 17,
|
||||||
|
OpTypeInt = 21, OpTypeStruct = 30, OpTypePointer = 32,
|
||||||
|
OpConstant = 43, OpVariable = 59, OpLoad = 61, OpAccessChain = 65,
|
||||||
|
OpDecorate = 71, OpMemberDecorate = 72,
|
||||||
|
OpConvertUToAccelerationStructureKHR = 4447,
|
||||||
|
OpTypeAccelerationStructureKHR = 5341,
|
||||||
|
CapabilityInt64 = 11,
|
||||||
|
StorageClassPushConstant = 9,
|
||||||
|
DecorationBlock = 2, DecorationOffset = 35,
|
||||||
|
};
|
||||||
|
|
||||||
|
inline bool IsAnnotation(std::uint32_t op) {
|
||||||
|
// OpDecorate/OpMemberDecorate/OpDecorationGroup/OpGroupDecorate/
|
||||||
|
// OpGroupMemberDecorate/OpDecorateId/OpDecorate(Member)String.
|
||||||
|
return op == 71 || op == 72 || op == 73 || op == 74 || op == 75
|
||||||
|
|| op == 332 || op == 5632 || op == 5633;
|
||||||
|
}
|
||||||
|
|
||||||
|
using Instr = std::vector<std::uint32_t>;
|
||||||
|
|
||||||
|
inline void Patch(std::vector<std::uint32_t>& words) {
|
||||||
|
if (words.size() < 5) return; // not a SPIR-V module we understand.
|
||||||
|
|
||||||
|
// Split header (5 words) from the instruction stream.
|
||||||
|
std::uint32_t bound = words[3];
|
||||||
|
std::vector<Instr> instrs;
|
||||||
|
for (std::size_t i = 5; i < words.size();) {
|
||||||
|
std::uint32_t len = words[i] >> 16;
|
||||||
|
if (len == 0 || i + len > words.size()) return; // malformed — bail.
|
||||||
|
instrs.emplace_back(words.begin() + i, words.begin() + i + len);
|
||||||
|
i += len;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Scan for the AS type, reusable int/long types+constants, and the
|
||||||
|
// section boundaries we need to insert into.
|
||||||
|
std::uint32_t asTypeId = 0, ulongTypeId = 0, uintTypeId = 0, uintZeroId = 0;
|
||||||
|
std::size_t lastCapIdx = 0, lastAnnotIdx = 0, firstFuncIdx = instrs.size();
|
||||||
|
std::size_t entryIdx = instrs.size();
|
||||||
|
for (std::size_t k = 0; k < instrs.size(); ++k) {
|
||||||
|
std::uint32_t op = instrs[k][0] & 0xFFFFu;
|
||||||
|
switch (op) {
|
||||||
|
case OpTypeAccelerationStructureKHR: asTypeId = instrs[k][1]; break;
|
||||||
|
case OpTypeInt:
|
||||||
|
if (instrs[k][2] == 64 && instrs[k][3] == 0) ulongTypeId = instrs[k][1];
|
||||||
|
else if (instrs[k][2] == 32 && instrs[k][3] == 0) uintTypeId = instrs[k][1];
|
||||||
|
break;
|
||||||
|
case OpConstant:
|
||||||
|
if (uintTypeId && instrs[k][1] == uintTypeId && instrs[k][3] == 0)
|
||||||
|
uintZeroId = instrs[k][2];
|
||||||
|
break;
|
||||||
|
case OpCapability: lastCapIdx = k; break;
|
||||||
|
case OpEntryPoint: if (entryIdx == instrs.size()) entryIdx = k; break;
|
||||||
|
default: break;
|
||||||
|
}
|
||||||
|
if (IsAnnotation(op)) lastAnnotIdx = k;
|
||||||
|
if (op == 54 /*OpFunction*/ && firstFuncIdx == instrs.size()) firstFuncIdx = k;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (asTypeId == 0) return; // shader never reads an acceleration structure.
|
||||||
|
|
||||||
|
auto newId = [&] { return bound++; };
|
||||||
|
auto mk = [](std::initializer_list<std::uint32_t> ops) {
|
||||||
|
Instr in(ops);
|
||||||
|
in[0] = static_cast<std::uint32_t>(in.size() << 16) | (in[0] & 0xFFFFu);
|
||||||
|
return in;
|
||||||
|
};
|
||||||
|
|
||||||
|
// ── Synthesize the types/constants/push-constant we need, reusing any
|
||||||
|
// the module already defines (SPIR-V forbids duplicate type defs).
|
||||||
|
std::vector<Instr> typeDefs;
|
||||||
|
if (uintTypeId == 0) {
|
||||||
|
uintTypeId = newId();
|
||||||
|
typeDefs.push_back(mk({OpTypeInt, uintTypeId, 32, 0}));
|
||||||
|
}
|
||||||
|
if (uintZeroId == 0) {
|
||||||
|
uintZeroId = newId();
|
||||||
|
typeDefs.push_back(mk({OpConstant, uintTypeId, uintZeroId, 0}));
|
||||||
|
}
|
||||||
|
if (ulongTypeId == 0) {
|
||||||
|
ulongTypeId = newId();
|
||||||
|
typeDefs.push_back(mk({OpTypeInt, ulongTypeId, 64, 0}));
|
||||||
|
}
|
||||||
|
std::uint32_t pcStructId = newId();
|
||||||
|
std::uint32_t ptrPushStructId = newId();
|
||||||
|
std::uint32_t ptrPushUlongId = newId();
|
||||||
|
std::uint32_t pcVarId = newId();
|
||||||
|
typeDefs.push_back(mk({OpTypeStruct, pcStructId, ulongTypeId}));
|
||||||
|
typeDefs.push_back(mk({OpTypePointer, ptrPushStructId, StorageClassPushConstant, pcStructId}));
|
||||||
|
typeDefs.push_back(mk({OpTypePointer, ptrPushUlongId, StorageClassPushConstant, ulongTypeId}));
|
||||||
|
typeDefs.push_back(mk({OpVariable, ptrPushStructId, pcVarId, StorageClassPushConstant}));
|
||||||
|
|
||||||
|
std::vector<Instr> decorations = {
|
||||||
|
mk({OpMemberDecorate, pcStructId, 0, DecorationOffset, 0}),
|
||||||
|
mk({OpDecorate, pcStructId, DecorationBlock}),
|
||||||
|
};
|
||||||
|
|
||||||
|
// ── Rewrite each `OpLoad %asType <ptr>` into address-load + convert.
|
||||||
|
std::vector<Instr> rebuilt;
|
||||||
|
rebuilt.reserve(instrs.size() + 8);
|
||||||
|
for (const Instr& in : instrs) {
|
||||||
|
std::uint32_t op = in[0] & 0xFFFFu;
|
||||||
|
if (op == OpLoad && in[1] == asTypeId) {
|
||||||
|
std::uint32_t resultId = in[2];
|
||||||
|
std::uint32_t chainId = newId();
|
||||||
|
std::uint32_t addrId = newId();
|
||||||
|
rebuilt.push_back(mk({OpAccessChain, ptrPushUlongId, chainId, pcVarId, uintZeroId}));
|
||||||
|
rebuilt.push_back(mk({OpLoad, ulongTypeId, addrId, chainId}));
|
||||||
|
rebuilt.push_back(mk({OpConvertUToAccelerationStructureKHR, asTypeId, resultId, addrId}));
|
||||||
|
} else {
|
||||||
|
rebuilt.push_back(in);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
instrs.swap(rebuilt);
|
||||||
|
|
||||||
|
// Recompute structural anchors (the rewrite above shifted indices).
|
||||||
|
lastCapIdx = 0; lastAnnotIdx = 0; firstFuncIdx = instrs.size(); entryIdx = instrs.size();
|
||||||
|
for (std::size_t k = 0; k < instrs.size(); ++k) {
|
||||||
|
std::uint32_t op = instrs[k][0] & 0xFFFFu;
|
||||||
|
if (op == OpCapability) lastCapIdx = k;
|
||||||
|
if (op == OpEntryPoint && entryIdx == instrs.size()) entryIdx = k;
|
||||||
|
if (IsAnnotation(op)) lastAnnotIdx = k;
|
||||||
|
if (op == 54 && firstFuncIdx == instrs.size()) firstFuncIdx = k;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Append the push-constant variable to the entry point's interface
|
||||||
|
// list (required for SPIR-V ≥ 1.4 — both raygen modules are 1.4).
|
||||||
|
if (entryIdx != instrs.size() && words[1] >= 0x00010400u) {
|
||||||
|
instrs[entryIdx].push_back(pcVarId);
|
||||||
|
instrs[entryIdx][0] = static_cast<std::uint32_t>(instrs[entryIdx].size() << 16)
|
||||||
|
| OpEntryPoint;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Insert highest-index-first so earlier anchors stay valid.
|
||||||
|
instrs.insert(instrs.begin() + firstFuncIdx, typeDefs.begin(), typeDefs.end());
|
||||||
|
instrs.insert(instrs.begin() + lastAnnotIdx + 1, decorations.begin(), decorations.end());
|
||||||
|
instrs.insert(instrs.begin() + lastCapIdx + 1, mk({OpCapability, CapabilityInt64}));
|
||||||
|
|
||||||
|
// ── Reassemble: header (with updated bound) + instruction stream.
|
||||||
|
std::vector<std::uint32_t> out(words.begin(), words.begin() + 5);
|
||||||
|
out[3] = bound;
|
||||||
|
for (const Instr& in : instrs) out.insert(out.end(), in.begin(), in.end());
|
||||||
|
words.swap(out);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// ─── END NVIDIA descriptor-heap AS-read workaround ────────────────────────
|
||||||
|
|
||||||
export namespace Crafter {
|
export namespace Crafter {
|
||||||
class VulkanShader {
|
class VulkanShader {
|
||||||
public:
|
public:
|
||||||
|
|
@ -54,7 +222,15 @@ export namespace Crafter {
|
||||||
}
|
}
|
||||||
|
|
||||||
file.close();
|
file.close();
|
||||||
|
|
||||||
|
// NVIDIA descriptor-heap AS-read workaround (issue #15 / #7).
|
||||||
|
// No-op on every other driver and on shaders that don't read an
|
||||||
|
// acceleration structure. Remove with the rest of the workaround
|
||||||
|
// once a fixed NVIDIA driver ships.
|
||||||
|
if (Device::workaroundDescriptorHeapAS) {
|
||||||
|
WorkaroundNvidiaAS::Patch(spirv);
|
||||||
|
}
|
||||||
|
|
||||||
VkShaderModuleCreateInfo module_info{VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO};
|
VkShaderModuleCreateInfo module_info{VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO};
|
||||||
module_info.codeSize = spirv.size() * sizeof(uint32_t);
|
module_info.codeSize = spirv.size() * sizeof(uint32_t);
|
||||||
module_info.pCode = spirv.data();
|
module_info.pCode = spirv.data();
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue