From d08c7cea117a1530188d6bc9b824c505b87192ee Mon Sep 17 00:00:00 2001 From: catbot Date: Wed, 3 Jun 2026 20:05:12 +0000 Subject: [PATCH] docs(vulkan-rt): document dynamic descriptor_heap-index hit-shader fault (#23) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Indexing a `layout(descriptor_heap)` array with a runtime (non-constant) index inside a ray-tracing hit shader device-losts on NVIDIA 610.43.02, for both SSBO and sampled-image descriptors. A constant/spec-constant index is fine, and the same dynamic pattern works in fragment shaders, so it's an RT-stage-specific driver fault — the same family as #7/#15 (descriptor-heap AS reads) and #21/#22 (RT recursion + compute TLAS push). Unlike the AS-read fault, this cannot be worked around transparently: a sampled image has no device-address escape hatch the way an acceleration structure does (OpConvertUToAccelerationStructureKHR), and a buffer-only buffer_reference rewrite would need a whole address-table architecture while still leaving the texture half broken. So the resolution is the documented-limitation path (the precedent set by #7). Records the fault and its isolation in README's Native RT status and in the Sponza example README (the textured-closest-hit example, which already reads its albedo through a spec-constant slot for exactly this reason). Documents the recommended consumer pattern: bind one resource and index *within* it dynamically (single geometry SSBO / buffer_reference at a spec-constant slot; one texture2DArray indexed by layer) rather than selecting a descriptor dynamically — what the WebGPU path already does. Co-Authored-By: Claude Opus 4.8 --- README.md | 25 +++++++++ examples/Sponza/README.md | 111 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 136 insertions(+) diff --git a/README.md b/README.md index e3dee94..333c5a9 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,31 @@ shaded through an intersection shader with an any-hit cut-out. > [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md) > for the full investigation. WebGPU RT is unaffected. +> **Native RT limitation — dynamic `descriptor_heap` indexing in hit +> shaders:** on the same NVIDIA driver, indexing a `descriptor_heap` +> array with a **runtime (non-constant)** index inside a ray-tracing +> **hit** shader also device-losts (`VK_ERROR_DEVICE_LOST`), for plain +> SSBO **and** sampled-image descriptors. A **constant / spec-constant** +> index is fine (that's why [Sponza](examples/Sponza/README.md)'s +> closest-hit reads `albedo[albedoSlot]` through a spec constant), and +> the identical dynamic pattern works in fragment shaders (the UI +> renderer indexes `uiTextures[]` by per-item runtime slots) — so this +> is **RT-stage-specific**, not a general heap problem. Unlike the +> AS-read fault above this **cannot** be worked around transparently: +> sampled images have no device-address escape hatch the way an +> acceleration structure does (`OpConvertUToAccelerationStructureKHR`). +> The recommended pattern for bindless per-mesh geometry/material is to +> **bind one resource and index *within* it dynamically** rather than +> selecting a descriptor dynamically: pack geometry into a single SSBO +> (or reach it via `buffer_reference`) at a spec-constant slot and index +> by element offset, and put materials in one `texture2DArray` indexed +> by layer. Dynamic addressing *inside* a bound resource is ordinary +> memory/layer addressing and is unaffected; only dynamic selection of a +> *descriptor* faults. This is exactly what the WebGPU path already does +> (bucketed texture arrays + a single buffer). Full investigation and +> GLSL in [examples/Sponza/README.md](examples/Sponza/README.md) (issue +> #23). WebGPU RT is unaffected. + ## What's in here - **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas diff --git a/examples/Sponza/README.md b/examples/Sponza/README.md index 5545764..7ba3d7c 100644 --- a/examples/Sponza/README.md +++ b/examples/Sponza/README.md @@ -14,6 +14,117 @@ it via ray tracing on both Vulkan (native) and WebGPU (wasm). Same at the barycentric attribs as UVs — proof-of-binding, not visually accurate. Per-vertex UV interpolation is the next step. +The closest-hit reads its texture through a **spec constant** +(`albedo[albedoSlot]`), not a runtime index. That is deliberate — see +below. + +## Native RT limitation: dynamic `descriptor_heap` indexing in hit shaders + +On NVIDIA driver `610.43.02` (Vulkan 1.4), indexing a +`layout(descriptor_heap)` array with a **runtime (non-constant)** index +inside a ray-tracing **hit** shader aborts the device with +`VK_ERROR_DEVICE_LOST` (an instruction-pointer / `READ_INVALID` +device-fault) with validation off. GPU-Assisted Validation masks it — +the scene runs fine under GPU-AV — which is why a validated run doesn't +catch it. It is a **driver-side fault**, the same family as the +descriptor-heap AS-read fault (#7 / #15) and the RT recursion / compute +TLAS-push issues (#21 / #22), but here for plain **SSBO and +sampled-image** descriptors read with a non-constant heap index +(issue #23). + +### What was isolated (NVIDIA RTX 4090, driver `610.43.02`) + +Driving a native bindless RT scene headlessly and bisecting the +closest-hit: + +- A closest-hit that reads only `lightHeap[lightSlot]` where `lightSlot` + is a **spec constant** survives indefinitely. ✅ (This example's + `albedo[albedoSlot]` is exactly this case.) +- Reading `indexHeap[assetIndexStart + gl_InstanceCustomIndexEXT]` / + `vertexHeap[...]` — a heap index offset by a **runtime** value — + device-losts on the first geometry hit. ❌ +- Reading a **texture** dynamically, + `textureHeap[assetColorStart + gl_InstanceCustomIndexEXT]`, also + device-losts. ❌ So it is SSBO *and* sampled-image descriptors. +- `nonuniformEXT()` on the dynamic index does **not** help. +- The identical dynamic-heap-index pattern works fine in **fragment** + shaders (the UI renderer indexes `uiTextures[]` / `ui*Heap[]` by + per-item runtime slots), so this is **RT-stage-specific**, not a + general `descriptor_heap` problem. +- Reading a spec-constant-indexed SSBO in **raygen** works; only the + *dynamic* index in the hit stage faults. + +### Why there is no transparent engine workaround + +The AS-read fault (#15) is worked around transparently because an +acceleration structure can be reached two ways: through a descriptor, or +through its **device address** via +`OpConvertUToAccelerationStructureKHR` (which reads no descriptor). There +is exactly one TLAS, so the engine rewrites the heap AS read into an +address load and feeds the address in as push data. + +Neither half of that applies here: + +- **Sampled images have no device-address path.** A texture *must* be + reached through a descriptor; there is no `OpConvertUToImage`. A + dynamic heap texture index cannot be rewritten into anything that + avoids dynamic descriptor selection. +- **There are many buffers, dynamically selected.** SSBOs *can* be + reached by address (`buffer_reference` / `OpConvertUToPtr`), but a + per-mesh array selected by `gl_InstanceCustomIndexEXT` would need the + engine to maintain and bind an address-table buffer and a SPIR-V + rewrite far larger than the single-TLAS AS case — and it would still + leave the texture half broken. + +So the engine cannot paper over this the way it does the AS read. The +fix is on the **consumer** side: avoid dynamically selecting a +*descriptor* in a hit shader. + +### Recommended pattern + +The fault is dynamic selection of a **descriptor**. Indexing *within* a +single bound resource — an element offset into one SSBO, a layer into +one array texture — is ordinary memory / layer addressing and is +**unaffected**. So bind one resource and index inside it, rather than +indexing the heap: + +- **Geometry** — pack all meshes' vertices/indices into a single SSBO + bound at a **spec-constant** slot and index it by a runtime element + offset, or reach each mesh's buffer via `buffer_reference` (a device + address loaded from one bound table). Either way the *descriptor* is + constant; only the offset/address is dynamic. + + ```glsl + // ❌ faults in a hit shader on NVIDIA: dynamic descriptor selection + layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[]; + Vertex vtx = vertexHeap[assetVertexStart + gl_InstanceCustomIndexEXT].v[i]; + + // ✅ one descriptor (spec-constant slot), dynamic element offset + layout(constant_id = 0) const uint16_t vertexSlot = 0us; + layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[]; + uint base = assetVertexStart[gl_InstanceCustomIndexEXT]; // from a bound SSBO + Vertex vtx = vertexHeap[vertexSlot].v[base + i]; + ``` + +- **Materials / textures** — put them in one `texture2DArray` (or a small + number of arrays bucketed by format/size) bound at a spec-constant + slot and index by **layer**: + + ```glsl + // ✅ one array texture (spec-constant slot), dynamic layer index + layout(constant_id = 1) const uint16_t materialArraySlot = 0us; + layout(descriptor_heap) uniform sampler2DArray materials[]; + uint layer = materialLayer[gl_InstanceCustomIndexEXT]; // from a bound SSBO + vec3 albedo = texture(materials[materialArraySlot], vec3(uv, layer)).rgb; + ``` + +This is precisely what the WebGPU path already does — bucketed texture +arrays plus a single geometry buffer — so it is a proven, cross-backend +pattern, and it sidesteps the NVIDIA RT fault on the native path. + +Remove this section once a fixed NVIDIA driver ships and dynamic +`descriptor_heap` indexing in hit shaders stops faulting. + ## Asset fetch `project.cpp` calls `Crafter::GitFetch(...)` on