docs(vulkan-rt): document dynamic descriptor_heap-index hit-shader fault (#23)
Indexing a `layout(descriptor_heap)` array with a runtime (non-constant) index inside a ray-tracing hit shader device-losts on NVIDIA 610.43.02, for both SSBO and sampled-image descriptors. A constant/spec-constant index is fine, and the same dynamic pattern works in fragment shaders, so it's an RT-stage-specific driver fault — the same family as #7/#15 (descriptor-heap AS reads) and #21/#22 (RT recursion + compute TLAS push). Unlike the AS-read fault, this cannot be worked around transparently: a sampled image has no device-address escape hatch the way an acceleration structure does (OpConvertUToAccelerationStructureKHR), and a buffer-only buffer_reference rewrite would need a whole address-table architecture while still leaving the texture half broken. So the resolution is the documented-limitation path (the precedent set by #7). Records the fault and its isolation in README's Native RT status and in the Sponza example README (the textured-closest-hit example, which already reads its albedo through a spec-constant slot for exactly this reason). Documents the recommended consumer pattern: bind one resource and index *within* it dynamically (single geometry SSBO / buffer_reference at a spec-constant slot; one texture2DArray indexed by layer) rather than selecting a descriptor dynamically — what the WebGPU path already does. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
5358aee2f6
commit
d08c7cea11
2 changed files with 136 additions and 0 deletions
25
README.md
25
README.md
|
|
@ -43,6 +43,31 @@ shaded through an intersection shader with an any-hit cut-out.
|
|||
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
|
||||
> for the full investigation. WebGPU RT is unaffected.
|
||||
|
||||
> **Native RT limitation — dynamic `descriptor_heap` indexing in hit
|
||||
> shaders:** on the same NVIDIA driver, indexing a `descriptor_heap`
|
||||
> array with a **runtime (non-constant)** index inside a ray-tracing
|
||||
> **hit** shader also device-losts (`VK_ERROR_DEVICE_LOST`), for plain
|
||||
> SSBO **and** sampled-image descriptors. A **constant / spec-constant**
|
||||
> index is fine (that's why [Sponza](examples/Sponza/README.md)'s
|
||||
> closest-hit reads `albedo[albedoSlot]` through a spec constant), and
|
||||
> the identical dynamic pattern works in fragment shaders (the UI
|
||||
> renderer indexes `uiTextures[]` by per-item runtime slots) — so this
|
||||
> is **RT-stage-specific**, not a general heap problem. Unlike the
|
||||
> AS-read fault above this **cannot** be worked around transparently:
|
||||
> sampled images have no device-address escape hatch the way an
|
||||
> acceleration structure does (`OpConvertUToAccelerationStructureKHR`).
|
||||
> The recommended pattern for bindless per-mesh geometry/material is to
|
||||
> **bind one resource and index *within* it dynamically** rather than
|
||||
> selecting a descriptor dynamically: pack geometry into a single SSBO
|
||||
> (or reach it via `buffer_reference`) at a spec-constant slot and index
|
||||
> by element offset, and put materials in one `texture2DArray` indexed
|
||||
> by layer. Dynamic addressing *inside* a bound resource is ordinary
|
||||
> memory/layer addressing and is unaffected; only dynamic selection of a
|
||||
> *descriptor* faults. This is exactly what the WebGPU path already does
|
||||
> (bucketed texture arrays + a single buffer). Full investigation and
|
||||
> GLSL in [examples/Sponza/README.md](examples/Sponza/README.md) (issue
|
||||
> #23). WebGPU RT is unaffected.
|
||||
|
||||
## What's in here
|
||||
|
||||
- **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas
|
||||
|
|
|
|||
|
|
@ -14,6 +14,117 @@ it via ray tracing on both Vulkan (native) and WebGPU (wasm). Same
|
|||
at the barycentric attribs as UVs — proof-of-binding, not visually
|
||||
accurate. Per-vertex UV interpolation is the next step.
|
||||
|
||||
The closest-hit reads its texture through a **spec constant**
|
||||
(`albedo[albedoSlot]`), not a runtime index. That is deliberate — see
|
||||
below.
|
||||
|
||||
## Native RT limitation: dynamic `descriptor_heap` indexing in hit shaders
|
||||
|
||||
On NVIDIA driver `610.43.02` (Vulkan 1.4), indexing a
|
||||
`layout(descriptor_heap)` array with a **runtime (non-constant)** index
|
||||
inside a ray-tracing **hit** shader aborts the device with
|
||||
`VK_ERROR_DEVICE_LOST` (an instruction-pointer / `READ_INVALID`
|
||||
device-fault) with validation off. GPU-Assisted Validation masks it —
|
||||
the scene runs fine under GPU-AV — which is why a validated run doesn't
|
||||
catch it. It is a **driver-side fault**, the same family as the
|
||||
descriptor-heap AS-read fault (#7 / #15) and the RT recursion / compute
|
||||
TLAS-push issues (#21 / #22), but here for plain **SSBO and
|
||||
sampled-image** descriptors read with a non-constant heap index
|
||||
(issue #23).
|
||||
|
||||
### What was isolated (NVIDIA RTX 4090, driver `610.43.02`)
|
||||
|
||||
Driving a native bindless RT scene headlessly and bisecting the
|
||||
closest-hit:
|
||||
|
||||
- A closest-hit that reads only `lightHeap[lightSlot]` where `lightSlot`
|
||||
is a **spec constant** survives indefinitely. ✅ (This example's
|
||||
`albedo[albedoSlot]` is exactly this case.)
|
||||
- Reading `indexHeap[assetIndexStart + gl_InstanceCustomIndexEXT]` /
|
||||
`vertexHeap[...]` — a heap index offset by a **runtime** value —
|
||||
device-losts on the first geometry hit. ❌
|
||||
- Reading a **texture** dynamically,
|
||||
`textureHeap[assetColorStart + gl_InstanceCustomIndexEXT]`, also
|
||||
device-losts. ❌ So it is SSBO *and* sampled-image descriptors.
|
||||
- `nonuniformEXT()` on the dynamic index does **not** help.
|
||||
- The identical dynamic-heap-index pattern works fine in **fragment**
|
||||
shaders (the UI renderer indexes `uiTextures[]` / `ui*Heap[]` by
|
||||
per-item runtime slots), so this is **RT-stage-specific**, not a
|
||||
general `descriptor_heap` problem.
|
||||
- Reading a spec-constant-indexed SSBO in **raygen** works; only the
|
||||
*dynamic* index in the hit stage faults.
|
||||
|
||||
### Why there is no transparent engine workaround
|
||||
|
||||
The AS-read fault (#15) is worked around transparently because an
|
||||
acceleration structure can be reached two ways: through a descriptor, or
|
||||
through its **device address** via
|
||||
`OpConvertUToAccelerationStructureKHR` (which reads no descriptor). There
|
||||
is exactly one TLAS, so the engine rewrites the heap AS read into an
|
||||
address load and feeds the address in as push data.
|
||||
|
||||
Neither half of that applies here:
|
||||
|
||||
- **Sampled images have no device-address path.** A texture *must* be
|
||||
reached through a descriptor; there is no `OpConvertUToImage`. A
|
||||
dynamic heap texture index cannot be rewritten into anything that
|
||||
avoids dynamic descriptor selection.
|
||||
- **There are many buffers, dynamically selected.** SSBOs *can* be
|
||||
reached by address (`buffer_reference` / `OpConvertUToPtr`), but a
|
||||
per-mesh array selected by `gl_InstanceCustomIndexEXT` would need the
|
||||
engine to maintain and bind an address-table buffer and a SPIR-V
|
||||
rewrite far larger than the single-TLAS AS case — and it would still
|
||||
leave the texture half broken.
|
||||
|
||||
So the engine cannot paper over this the way it does the AS read. The
|
||||
fix is on the **consumer** side: avoid dynamically selecting a
|
||||
*descriptor* in a hit shader.
|
||||
|
||||
### Recommended pattern
|
||||
|
||||
The fault is dynamic selection of a **descriptor**. Indexing *within* a
|
||||
single bound resource — an element offset into one SSBO, a layer into
|
||||
one array texture — is ordinary memory / layer addressing and is
|
||||
**unaffected**. So bind one resource and index inside it, rather than
|
||||
indexing the heap:
|
||||
|
||||
- **Geometry** — pack all meshes' vertices/indices into a single SSBO
|
||||
bound at a **spec-constant** slot and index it by a runtime element
|
||||
offset, or reach each mesh's buffer via `buffer_reference` (a device
|
||||
address loaded from one bound table). Either way the *descriptor* is
|
||||
constant; only the offset/address is dynamic.
|
||||
|
||||
```glsl
|
||||
// ❌ faults in a hit shader on NVIDIA: dynamic descriptor selection
|
||||
layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];
|
||||
Vertex vtx = vertexHeap[assetVertexStart + gl_InstanceCustomIndexEXT].v[i];
|
||||
|
||||
// ✅ one descriptor (spec-constant slot), dynamic element offset
|
||||
layout(constant_id = 0) const uint16_t vertexSlot = 0us;
|
||||
layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];
|
||||
uint base = assetVertexStart[gl_InstanceCustomIndexEXT]; // from a bound SSBO
|
||||
Vertex vtx = vertexHeap[vertexSlot].v[base + i];
|
||||
```
|
||||
|
||||
- **Materials / textures** — put them in one `texture2DArray` (or a small
|
||||
number of arrays bucketed by format/size) bound at a spec-constant
|
||||
slot and index by **layer**:
|
||||
|
||||
```glsl
|
||||
// ✅ one array texture (spec-constant slot), dynamic layer index
|
||||
layout(constant_id = 1) const uint16_t materialArraySlot = 0us;
|
||||
layout(descriptor_heap) uniform sampler2DArray materials[];
|
||||
uint layer = materialLayer[gl_InstanceCustomIndexEXT]; // from a bound SSBO
|
||||
vec3 albedo = texture(materials[materialArraySlot], vec3(uv, layer)).rgb;
|
||||
```
|
||||
|
||||
This is precisely what the WebGPU path already does — bucketed texture
|
||||
arrays plus a single geometry buffer — so it is a proven, cross-backend
|
||||
pattern, and it sidesteps the NVIDIA RT fault on the native path.
|
||||
|
||||
Remove this section once a fixed NVIDIA driver ships and dynamic
|
||||
`descriptor_heap` indexing in hit shaders stops faulting.
|
||||
|
||||
## Asset fetch
|
||||
|
||||
`project.cpp` calls `Crafter::GitFetch(...)` on
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue