docs(vulkan-rt): document dynamic descriptor_heap-index hit-shader fault (#23) #24

Merged
catbot merged 1 commit from claude/issue-23 into master 2026-06-03 22:05:45 +02:00
2 changed files with 136 additions and 0 deletions

View file

@ -43,6 +43,31 @@ shaded through an intersection shader with an any-hit cut-out.
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
> for the full investigation. WebGPU RT is unaffected.
> **Native RT limitation — dynamic `descriptor_heap` indexing in hit
> shaders:** on the same NVIDIA driver, indexing a `descriptor_heap`
> array with a **runtime (non-constant)** index inside a ray-tracing
> **hit** shader also device-losts (`VK_ERROR_DEVICE_LOST`), for plain
> SSBO **and** sampled-image descriptors. A **constant / spec-constant**
> index is fine (that's why [Sponza](examples/Sponza/README.md)'s
> closest-hit reads `albedo[albedoSlot]` through a spec constant), and
> the identical dynamic pattern works in fragment shaders (the UI
> renderer indexes `uiTextures[]` by per-item runtime slots) — so this
> is **RT-stage-specific**, not a general heap problem. Unlike the
> AS-read fault above this **cannot** be worked around transparently:
> sampled images have no device-address escape hatch the way an
> acceleration structure does (`OpConvertUToAccelerationStructureKHR`).
> The recommended pattern for bindless per-mesh geometry/material is to
> **bind one resource and index *within* it dynamically** rather than
> selecting a descriptor dynamically: pack geometry into a single SSBO
> (or reach it via `buffer_reference`) at a spec-constant slot and index
> by element offset, and put materials in one `texture2DArray` indexed
> by layer. Dynamic addressing *inside* a bound resource is ordinary
> memory/layer addressing and is unaffected; only dynamic selection of a
> *descriptor* faults. This is exactly what the WebGPU path already does
> (bucketed texture arrays + a single buffer). Full investigation and
> GLSL in [examples/Sponza/README.md](examples/Sponza/README.md) (issue
> #23). WebGPU RT is unaffected.
## What's in here
- **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas

View file

@ -14,6 +14,117 @@ it via ray tracing on both Vulkan (native) and WebGPU (wasm). Same
at the barycentric attribs as UVs — proof-of-binding, not visually
accurate. Per-vertex UV interpolation is the next step.
The closest-hit reads its texture through a **spec constant**
(`albedo[albedoSlot]`), not a runtime index. That is deliberate — see
below.
## Native RT limitation: dynamic `descriptor_heap` indexing in hit shaders
On NVIDIA driver `610.43.02` (Vulkan 1.4), indexing a
`layout(descriptor_heap)` array with a **runtime (non-constant)** index
inside a ray-tracing **hit** shader aborts the device with
`VK_ERROR_DEVICE_LOST` (an instruction-pointer / `READ_INVALID`
device-fault) with validation off. GPU-Assisted Validation masks it —
the scene runs fine under GPU-AV — which is why a validated run doesn't
catch it. It is a **driver-side fault**, the same family as the
descriptor-heap AS-read fault (#7 / #15) and the RT recursion / compute
TLAS-push issues (#21 / #22), but here for plain **SSBO and
sampled-image** descriptors read with a non-constant heap index
(issue #23).
### What was isolated (NVIDIA RTX 4090, driver `610.43.02`)
Driving a native bindless RT scene headlessly and bisecting the
closest-hit:
- A closest-hit that reads only `lightHeap[lightSlot]` where `lightSlot`
is a **spec constant** survives indefinitely. ✅ (This example's
`albedo[albedoSlot]` is exactly this case.)
- Reading `indexHeap[assetIndexStart + gl_InstanceCustomIndexEXT]` /
`vertexHeap[...]` — a heap index offset by a **runtime** value —
device-losts on the first geometry hit. ❌
- Reading a **texture** dynamically,
`textureHeap[assetColorStart + gl_InstanceCustomIndexEXT]`, also
device-losts. ❌ So it is SSBO *and* sampled-image descriptors.
- `nonuniformEXT()` on the dynamic index does **not** help.
- The identical dynamic-heap-index pattern works fine in **fragment**
shaders (the UI renderer indexes `uiTextures[]` / `ui*Heap[]` by
per-item runtime slots), so this is **RT-stage-specific**, not a
general `descriptor_heap` problem.
- Reading a spec-constant-indexed SSBO in **raygen** works; only the
*dynamic* index in the hit stage faults.
### Why there is no transparent engine workaround
The AS-read fault (#15) is worked around transparently because an
acceleration structure can be reached two ways: through a descriptor, or
through its **device address** via
`OpConvertUToAccelerationStructureKHR` (which reads no descriptor). There
is exactly one TLAS, so the engine rewrites the heap AS read into an
address load and feeds the address in as push data.
Neither half of that applies here:
- **Sampled images have no device-address path.** A texture *must* be
reached through a descriptor; there is no `OpConvertUToImage`. A
dynamic heap texture index cannot be rewritten into anything that
avoids dynamic descriptor selection.
- **There are many buffers, dynamically selected.** SSBOs *can* be
reached by address (`buffer_reference` / `OpConvertUToPtr`), but a
per-mesh array selected by `gl_InstanceCustomIndexEXT` would need the
engine to maintain and bind an address-table buffer and a SPIR-V
rewrite far larger than the single-TLAS AS case — and it would still
leave the texture half broken.
So the engine cannot paper over this the way it does the AS read. The
fix is on the **consumer** side: avoid dynamically selecting a
*descriptor* in a hit shader.
### Recommended pattern
The fault is dynamic selection of a **descriptor**. Indexing *within* a
single bound resource — an element offset into one SSBO, a layer into
one array texture — is ordinary memory / layer addressing and is
**unaffected**. So bind one resource and index inside it, rather than
indexing the heap:
- **Geometry** — pack all meshes' vertices/indices into a single SSBO
bound at a **spec-constant** slot and index it by a runtime element
offset, or reach each mesh's buffer via `buffer_reference` (a device
address loaded from one bound table). Either way the *descriptor* is
constant; only the offset/address is dynamic.
```glsl
// ❌ faults in a hit shader on NVIDIA: dynamic descriptor selection
layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];
Vertex vtx = vertexHeap[assetVertexStart + gl_InstanceCustomIndexEXT].v[i];
// ✅ one descriptor (spec-constant slot), dynamic element offset
layout(constant_id = 0) const uint16_t vertexSlot = 0us;
layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];
uint base = assetVertexStart[gl_InstanceCustomIndexEXT]; // from a bound SSBO
Vertex vtx = vertexHeap[vertexSlot].v[base + i];
```
- **Materials / textures** — put them in one `texture2DArray` (or a small
number of arrays bucketed by format/size) bound at a spec-constant
slot and index by **layer**:
```glsl
// ✅ one array texture (spec-constant slot), dynamic layer index
layout(constant_id = 1) const uint16_t materialArraySlot = 0us;
layout(descriptor_heap) uniform sampler2DArray materials[];
uint layer = materialLayer[gl_InstanceCustomIndexEXT]; // from a bound SSBO
vec3 albedo = texture(materials[materialArraySlot], vec3(uv, layer)).rgb;
```
This is precisely what the WebGPU path already does — bucketed texture
arrays plus a single geometry buffer — so it is a proven, cross-backend
pattern, and it sidesteps the NVIDIA RT fault on the native path.
Remove this section once a fixed NVIDIA driver ships and dynamic
`descriptor_heap` indexing in hit shaders stops faulting.
## Asset fetch
`project.cpp` calls `Crafter::GitFetch(...)` on