Vulkan RT: recursion depth hardcoded to 1, and the descriptor-heap AS-read workaround is never pushed for compute dispatches #21

Closed
opened 2026-06-03 17:22:30 +02:00 by catbot · 0 comments
Member

Found while debugging 3DForts issue #87 (native Game scene VK_ERROR_DEVICE_LOST). Two confirmed gaps in the Vulkan RT path block any consumer with a non-trivial RT pipeline + GPU compute ray-queries on the NVIDIA proprietary driver. The simple VulkanTriangle example works, so these only surface with a richer pipeline.

Environment: RTX 4090, NVIDIA proprietary driver 610.43.02, VK_EXT_descriptor_heap path.

1. maxPipelineRayRecursionDepth is hardcoded to 1

interfaces/Crafter.Graphics-PipelineRTVulkan.cppm (the vkCreateRayTracingPipelinesKHR create-info) sets .maxPipelineRayRecursionDepth = 1 with no way for a consumer to raise it. Any closest-hit shader that calls traceRayEXT (e.g. tracing a shadow ray from a hit — an extremely common pattern; 3DForts does exactly this in closesthit.glsl) recurses to depth 2 and exceeds the pipeline limit, which is undefined behaviour and faults the device. VulkanTriangle only traces from the raygen (depth 1), so it never hits this. Please make the recursion depth configurable (e.g. a field on the pipeline init struct), defaulting to 1.

2. The NVIDIA descriptor-heap AS-read workaround is never applied to compute dispatches

The WorkaroundNvidiaAS SPIR-V rewrite (ShaderVulkan) rewrites every shader that loads an accelerationStructureEXT from the descriptor heap — including compute shaders — to instead read the TLAS device address from a push constant (Device::workaroundTlasPushOffset) and OpConvertUToAccelerationStructureKHR. But the only code that actually pushes that address is RTPass::Record (for the RT pipeline). ComputeShader::Dispatch only pushes the caller's push-constant payload at offset 0 and never writes the TLAS address. So any compute shader that ray-queries the TLAS via the heap (rayQueryEXT) runs the rewritten code against an unwritten push-constant slot → garbage AS handle → VK_ERROR_DEVICE_LOST (READ_INVALID). 3DForts's physics-builder-pick (and physics-splash) compute shaders ray-query the TLAS and reliably crash because of this; with those dispatches removed the scene survives.

A fix would have ComputeShader track its own per-shader workaroundTlasPushOffset (the global is also clobbered by whichever shader was patched last, so it can't serve multiple shaders with different push layouts) and push the active TLAS address in Dispatch when the shader was rewritten — mirroring what RTPass::Record does — or expose an API so the caller can supply the AS for a dispatch.

Even after working around #1 (disabling the shadow recursion) and a 3DForts-side bug (raygen read the camera basis as vec3 composite loads from a per-frame-rewritten descriptor_heap SSBO, which also faults on this driver — fixed in 3DForts), a single-bounce raygen trace over 3DForts's multi-instance TLAS renders all-miss (black) and the process is killed (silent, no validation error → looks like a GPU hang/TDR) deterministically a few thousand frames in. The VulkanTriangle RT path renders fine on the same machine, so something about the larger/per-frame-rebuilt TLAS or SBT differs. Happy to provide a minimal repro if useful.

Found while debugging 3DForts issue #87 (native Game scene `VK_ERROR_DEVICE_LOST`). Two confirmed gaps in the Vulkan RT path block any consumer with a non-trivial RT pipeline + GPU compute ray-queries on the NVIDIA proprietary driver. The simple `VulkanTriangle` example works, so these only surface with a richer pipeline. Environment: RTX 4090, NVIDIA proprietary driver 610.43.02, `VK_EXT_descriptor_heap` path. ## 1. `maxPipelineRayRecursionDepth` is hardcoded to 1 `interfaces/Crafter.Graphics-PipelineRTVulkan.cppm` (the `vkCreateRayTracingPipelinesKHR` create-info) sets `.maxPipelineRayRecursionDepth = 1` with no way for a consumer to raise it. Any closest-hit shader that calls `traceRayEXT` (e.g. tracing a shadow ray from a hit — an extremely common pattern; 3DForts does exactly this in `closesthit.glsl`) recurses to depth 2 and exceeds the pipeline limit, which is undefined behaviour and faults the device. `VulkanTriangle` only traces from the raygen (depth 1), so it never hits this. Please make the recursion depth configurable (e.g. a field on the pipeline init struct), defaulting to 1. ## 2. The NVIDIA descriptor-heap AS-read workaround is never applied to compute dispatches The `WorkaroundNvidiaAS` SPIR-V rewrite (ShaderVulkan) rewrites **every** shader that loads an `accelerationStructureEXT` from the descriptor heap — including compute shaders — to instead read the TLAS device address from a push constant (`Device::workaroundTlasPushOffset`) and `OpConvertUToAccelerationStructureKHR`. But the only code that actually pushes that address is `RTPass::Record` (for the RT pipeline). `ComputeShader::Dispatch` only pushes the caller's push-constant payload at offset 0 and never writes the TLAS address. So any compute shader that ray-queries the TLAS via the heap (`rayQueryEXT`) runs the rewritten code against an unwritten push-constant slot → garbage AS handle → `VK_ERROR_DEVICE_LOST` (READ_INVALID). 3DForts's `physics-builder-pick` (and `physics-splash`) compute shaders ray-query the TLAS and reliably crash because of this; with those dispatches removed the scene survives. A fix would have `ComputeShader` track its own per-shader `workaroundTlasPushOffset` (the global is also clobbered by whichever shader was patched last, so it can't serve multiple shaders with different push layouts) and push the active TLAS address in `Dispatch` when the shader was rewritten — mirroring what `RTPass::Record` does — or expose an API so the caller can supply the AS for a dispatch. ## Possibly related (needs your eyes) Even after working around #1 (disabling the shadow recursion) and a 3DForts-side bug (raygen read the camera basis as `vec3` composite loads from a per-frame-rewritten descriptor_heap SSBO, which also faults on this driver — fixed in 3DForts), a single-bounce raygen trace over 3DForts's multi-instance TLAS renders all-miss (black) and the process is killed (silent, no validation error → looks like a GPU hang/TDR) deterministically a few thousand frames in. The `VulkanTriangle` RT path renders fine on the same machine, so something about the larger/per-frame-rebuilt TLAS or SBT differs. Happy to provide a minimal repro if useful.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Catcrafts/Crafter.Graphics#21
No description provided.