fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21) #22

Merged
catbot merged 1 commit from claude/issue-21 into master 2026-06-03 20:36:23 +02:00
Member

Fixes the two confirmed Vulkan RT gaps from #21 that fault the device on the NVIDIA proprietary driver once the pipeline gets non-trivial (the simple VulkanTriangle example never exercises either).

1. maxPipelineRayRecursionDepth is now configurable

PipelineRTVulkan::Init took a hardcoded .maxPipelineRayRecursionDepth = 1, so any closest-hit shader that traces a secondary ray (a shadow ray, etc.) recursed to depth 2, exceeded the pipeline limit (undefined behaviour), and faulted the device. Init now takes a trailing std::uint32_t maxRecursionDepth = 1 parameter, clamped to the device's maxRayRecursionDepth. Existing callers are unchanged (default 1).

2. The descriptor-heap AS-read workaround now reaches compute dispatches

WorkaroundNvidiaAS::Patch rewrites every shader that reads an accelerationStructureEXT from the descriptor heap — including compute shaders — to read the TLAS device address from a push constant. But only RTPass::Record ever pushed that address; ComputeShader::Dispatch did not. A compute shader that ray-queries the TLAS (rayQueryEXT) therefore read an unwritten push slot → garbage AS handle → VK_ERROR_DEVICE_LOST.

  • Patch now returns a per-shader PatchResult {patched, tlasPushOffset} instead of writing the global Device::workaroundTlasPushOffset (removed). The global was clobbered by whichever shader was patched last, so it could not serve several shaders with differing push layouts.
  • VulkanShader stores patchedAS / tlasPushOffset; ShaderBindingTableVulkan and PipelineRTVulkan carry them so RTPass reads the offset off the pipeline.
  • ComputeShader tracks its own offset and pushes the caller-supplied TLAS address in Dispatch (new defaulted tlasAddress parameter — pass RenderingElement3D::tlases[frameIdx].address), mirroring RTPass::Record.

Tests

crafter-build test is green. The PushConstantRewrite regression test now asserts Patch's returned patched/tlasPushOffset (previously read off the removed global) and adds two ray-querying compute cases (synthesize + merge), proving the rewrite is stage-agnostic and the per-shader offset is correct:

[no-push-constant] ok (tlas offset: 0)
[merge-mat4-vec3-uint] ok (tlas offset: 80)
[merge-uint] ok (tlas offset: 8)
[merge-array] ok (tlas offset: 40)
[push-constant-no-as] ok (tlas offset: 0)
[compute-no-push] ok (tlas offset: 0)
[compute-merge-uint] ok (tlas offset: 8)
all push-constant rewrite cases passed

The single-bounce multi-instance all-miss / silent GPU hang is left untouched — it's marked "needs your eyes" with an offered repro and isn't reproducible here. Worth noting for the full 3DForts path: an RT pipeline where multiple stages (raygen and closesthit) read the AS with different push-constant layouts still resolves to a single pipeline-wide push offset (the last AS-reading stage's). That's a pre-existing limitation of the workaround, not a regression, but it'll matter once recursion is re-enabled. A minimal repro would help.

Resolves #21

Fixes the two confirmed Vulkan RT gaps from #21 that fault the device on the NVIDIA proprietary driver once the pipeline gets non-trivial (the simple `VulkanTriangle` example never exercises either). ## 1. `maxPipelineRayRecursionDepth` is now configurable `PipelineRTVulkan::Init` took a hardcoded `.maxPipelineRayRecursionDepth = 1`, so any closest-hit shader that traces a secondary ray (a shadow ray, etc.) recursed to depth 2, exceeded the pipeline limit (undefined behaviour), and faulted the device. `Init` now takes a trailing `std::uint32_t maxRecursionDepth = 1` parameter, clamped to the device's `maxRayRecursionDepth`. Existing callers are unchanged (default 1). ## 2. The descriptor-heap AS-read workaround now reaches compute dispatches `WorkaroundNvidiaAS::Patch` rewrites **every** shader that reads an `accelerationStructureEXT` from the descriptor heap — including compute shaders — to read the TLAS device address from a push constant. But only `RTPass::Record` ever pushed that address; `ComputeShader::Dispatch` did not. A compute shader that ray-queries the TLAS (`rayQueryEXT`) therefore read an unwritten push slot → garbage AS handle → `VK_ERROR_DEVICE_LOST`. - `Patch` now returns a **per-shader** `PatchResult {patched, tlasPushOffset}` instead of writing the global `Device::workaroundTlasPushOffset` (removed). The global was clobbered by whichever shader was patched last, so it could not serve several shaders with differing push layouts. - `VulkanShader` stores `patchedAS` / `tlasPushOffset`; `ShaderBindingTableVulkan` and `PipelineRTVulkan` carry them so `RTPass` reads the offset off the pipeline. - `ComputeShader` tracks its own offset and pushes the caller-supplied TLAS address in `Dispatch` (new defaulted `tlasAddress` parameter — pass `RenderingElement3D::tlases[frameIdx].address`), mirroring `RTPass::Record`. ## Tests `crafter-build test` is green. The `PushConstantRewrite` regression test now asserts `Patch`'s returned `patched`/`tlasPushOffset` (previously read off the removed global) and adds two ray-querying **compute** cases (synthesize + merge), proving the rewrite is stage-agnostic and the per-shader offset is correct: ``` [no-push-constant] ok (tlas offset: 0) [merge-mat4-vec3-uint] ok (tlas offset: 80) [merge-uint] ok (tlas offset: 8) [merge-array] ok (tlas offset: 40) [push-constant-no-as] ok (tlas offset: 0) [compute-no-push] ok (tlas offset: 0) [compute-merge-uint] ok (tlas offset: 8) all push-constant rewrite cases passed ``` ## Not addressed (the "possibly related" item in #21) The single-bounce multi-instance all-miss / silent GPU hang is left untouched — it's marked "needs your eyes" with an offered repro and isn't reproducible here. Worth noting for the full 3DForts path: an RT pipeline where **multiple** stages (raygen *and* closesthit) read the AS with *different* push-constant layouts still resolves to a single pipeline-wide push offset (the last AS-reading stage's). That's a pre-existing limitation of the workaround, not a regression, but it'll matter once recursion is re-enabled. A minimal repro would help. Resolves #21
Two gaps in the Vulkan RT path that fault the device on the NVIDIA
proprietary driver with a non-trivial pipeline (simple VulkanTriangle
never hit them):

1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit
   shader that traces a secondary ray (shadow ray — a very common
   pattern) recursed past the pipeline limit (UB → device fault).
   PipelineRTVulkan::Init now takes a maxRecursionDepth parameter
   (default 1, clamped to the device's maxRayRecursionDepth).

2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader
   that reads an accelerationStructureEXT from the heap — including
   compute shaders — to read the TLAS device address from a push
   constant, but only RTPass pushed that address. A compute shader that
   ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten
   push slot → garbage AS handle → VK_ERROR_DEVICE_LOST.

   WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult
   {patched, tlasPushOffset} instead of writing the clobber-prone global
   Device::workaroundTlasPushOffset (removed). VulkanShader stores it;
   ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and
   ComputeShader tracks its own offset and pushes the caller-supplied
   TLAS address in Dispatch (new defaulted tlasAddress parameter),
   mirroring RTPass::Record.

The PushConstantRewrite regression test now asserts Patch's returned
patched/offset and adds two ray-querying compute-shader cases, proving
the rewrite is stage-agnostic and the per-shader offset is correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
catbot merged commit 5358aee2f6 into master 2026-06-03 20:36:23 +02:00
catbot deleted branch claude/issue-21 2026-06-03 20:36:23 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Catcrafts/Crafter.Graphics!22
No description provided.