fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21)

Two gaps in the Vulkan RT path that fault the device on the NVIDIA
proprietary driver with a non-trivial pipeline (simple VulkanTriangle
never hit them):

1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit
   shader that traces a secondary ray (shadow ray — a very common
   pattern) recursed past the pipeline limit (UB → device fault).
   PipelineRTVulkan::Init now takes a maxRecursionDepth parameter
   (default 1, clamped to the device's maxRayRecursionDepth).

2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader
   that reads an accelerationStructureEXT from the heap — including
   compute shaders — to read the TLAS device address from a push
   constant, but only RTPass pushed that address. A compute shader that
   ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten
   push slot → garbage AS handle → VK_ERROR_DEVICE_LOST.

   WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult
   {patched, tlasPushOffset} instead of writing the clobber-prone global
   Device::workaroundTlasPushOffset (removed). VulkanShader stores it;
   ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and
   ComputeShader tracks its own offset and pushes the caller-supplied
   TLAS address in Dispatch (new defaulted tlasAddress parameter),
   mirroring RTPass::Record.

The PushConstantRewrite regression test now asserts Patch's returned
patched/offset and adds two ray-querying compute-shader cases, proving
the rewrite is stage-agnostic and the per-shader offset is correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
catbot 2026-06-03 18:35:39 +00:00
commit 1c310762a7
8 changed files with 248 additions and 75 deletions

View file

@ -178,12 +178,12 @@ export namespace Crafter {
// path and RTPass pushes the active TLAS address as push data. Delete
// this flag and everything keyed on it once a fixed driver ships.
inline static bool workaroundDescriptorHeapAS = false;
// Byte offset of the TLAS-address member inside the patched raygen's
// push-constant block — 0 for a freshly synthesized block, or the end
// of the user's own block when the address is appended to it (the
// shader can't have two push-constant blocks). VulkanShader sets this
// at module load; RTPass feeds it to vkCmdPushDataEXT.
inline static std::uint32_t workaroundTlasPushOffset = 0;
// The byte offset of the TLAS-address member inside a patched shader's
// push-constant block is tracked per-shader (VulkanShader::tlasPushOffset),
// not here: a single global is clobbered by whichever shader was patched
// last and so cannot serve several shaders with differing push layouts
// (e.g. an RT raygen and a ray-querying compute shader). RTPass and
// ComputeShader read the offset off the pipeline they record.
static void CheckVkResult(VkResult result);
static std::uint32_t GetMemoryType(std::uint32_t typeBits, VkMemoryPropertyFlags properties);