docs(vulkan-rt): document dynamic descriptor_heap-index hit-shader fault (#23) #24
2 changed files with 136 additions and 0 deletions
25
README.md
25
README.md
|
|
@ -43,6 +43,31 @@ shaded through an intersection shader with an any-hit cut-out.
|
||||||
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
|
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
|
||||||
> for the full investigation. WebGPU RT is unaffected.
|
> for the full investigation. WebGPU RT is unaffected.
|
||||||
|
|
||||||
|
> **Native RT limitation — dynamic `descriptor_heap` indexing in hit
|
||||||
|
> shaders:** on the same NVIDIA driver, indexing a `descriptor_heap`
|
||||||
|
> array with a **runtime (non-constant)** index inside a ray-tracing
|
||||||
|
> **hit** shader also device-losts (`VK_ERROR_DEVICE_LOST`), for plain
|
||||||
|
> SSBO **and** sampled-image descriptors. A **constant / spec-constant**
|
||||||
|
> index is fine (that's why [Sponza](examples/Sponza/README.md)'s
|
||||||
|
> closest-hit reads `albedo[albedoSlot]` through a spec constant), and
|
||||||
|
> the identical dynamic pattern works in fragment shaders (the UI
|
||||||
|
> renderer indexes `uiTextures[]` by per-item runtime slots) — so this
|
||||||
|
> is **RT-stage-specific**, not a general heap problem. Unlike the
|
||||||
|
> AS-read fault above this **cannot** be worked around transparently:
|
||||||
|
> sampled images have no device-address escape hatch the way an
|
||||||
|
> acceleration structure does (`OpConvertUToAccelerationStructureKHR`).
|
||||||
|
> The recommended pattern for bindless per-mesh geometry/material is to
|
||||||
|
> **bind one resource and index *within* it dynamically** rather than
|
||||||
|
> selecting a descriptor dynamically: pack geometry into a single SSBO
|
||||||
|
> (or reach it via `buffer_reference`) at a spec-constant slot and index
|
||||||
|
> by element offset, and put materials in one `texture2DArray` indexed
|
||||||
|
> by layer. Dynamic addressing *inside* a bound resource is ordinary
|
||||||
|
> memory/layer addressing and is unaffected; only dynamic selection of a
|
||||||
|
> *descriptor* faults. This is exactly what the WebGPU path already does
|
||||||
|
> (bucketed texture arrays + a single buffer). Full investigation and
|
||||||
|
> GLSL in [examples/Sponza/README.md](examples/Sponza/README.md) (issue
|
||||||
|
> #23). WebGPU RT is unaffected.
|
||||||
|
|
||||||
## What's in here
|
## What's in here
|
||||||
|
|
||||||
- **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas
|
- **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas
|
||||||
|
|
|
||||||
|
|
@ -14,6 +14,117 @@ it via ray tracing on both Vulkan (native) and WebGPU (wasm). Same
|
||||||
at the barycentric attribs as UVs — proof-of-binding, not visually
|
at the barycentric attribs as UVs — proof-of-binding, not visually
|
||||||
accurate. Per-vertex UV interpolation is the next step.
|
accurate. Per-vertex UV interpolation is the next step.
|
||||||
|
|
||||||
|
The closest-hit reads its texture through a **spec constant**
|
||||||
|
(`albedo[albedoSlot]`), not a runtime index. That is deliberate — see
|
||||||
|
below.
|
||||||
|
|
||||||
|
## Native RT limitation: dynamic `descriptor_heap` indexing in hit shaders
|
||||||
|
|
||||||
|
On NVIDIA driver `610.43.02` (Vulkan 1.4), indexing a
|
||||||
|
`layout(descriptor_heap)` array with a **runtime (non-constant)** index
|
||||||
|
inside a ray-tracing **hit** shader aborts the device with
|
||||||
|
`VK_ERROR_DEVICE_LOST` (an instruction-pointer / `READ_INVALID`
|
||||||
|
device-fault) with validation off. GPU-Assisted Validation masks it —
|
||||||
|
the scene runs fine under GPU-AV — which is why a validated run doesn't
|
||||||
|
catch it. It is a **driver-side fault**, the same family as the
|
||||||
|
descriptor-heap AS-read fault (#7 / #15) and the RT recursion / compute
|
||||||
|
TLAS-push issues (#21 / #22), but here for plain **SSBO and
|
||||||
|
sampled-image** descriptors read with a non-constant heap index
|
||||||
|
(issue #23).
|
||||||
|
|
||||||
|
### What was isolated (NVIDIA RTX 4090, driver `610.43.02`)
|
||||||
|
|
||||||
|
Driving a native bindless RT scene headlessly and bisecting the
|
||||||
|
closest-hit:
|
||||||
|
|
||||||
|
- A closest-hit that reads only `lightHeap[lightSlot]` where `lightSlot`
|
||||||
|
is a **spec constant** survives indefinitely. ✅ (This example's
|
||||||
|
`albedo[albedoSlot]` is exactly this case.)
|
||||||
|
- Reading `indexHeap[assetIndexStart + gl_InstanceCustomIndexEXT]` /
|
||||||
|
`vertexHeap[...]` — a heap index offset by a **runtime** value —
|
||||||
|
device-losts on the first geometry hit. ❌
|
||||||
|
- Reading a **texture** dynamically,
|
||||||
|
`textureHeap[assetColorStart + gl_InstanceCustomIndexEXT]`, also
|
||||||
|
device-losts. ❌ So it is SSBO *and* sampled-image descriptors.
|
||||||
|
- `nonuniformEXT()` on the dynamic index does **not** help.
|
||||||
|
- The identical dynamic-heap-index pattern works fine in **fragment**
|
||||||
|
shaders (the UI renderer indexes `uiTextures[]` / `ui*Heap[]` by
|
||||||
|
per-item runtime slots), so this is **RT-stage-specific**, not a
|
||||||
|
general `descriptor_heap` problem.
|
||||||
|
- Reading a spec-constant-indexed SSBO in **raygen** works; only the
|
||||||
|
*dynamic* index in the hit stage faults.
|
||||||
|
|
||||||
|
### Why there is no transparent engine workaround
|
||||||
|
|
||||||
|
The AS-read fault (#15) is worked around transparently because an
|
||||||
|
acceleration structure can be reached two ways: through a descriptor, or
|
||||||
|
through its **device address** via
|
||||||
|
`OpConvertUToAccelerationStructureKHR` (which reads no descriptor). There
|
||||||
|
is exactly one TLAS, so the engine rewrites the heap AS read into an
|
||||||
|
address load and feeds the address in as push data.
|
||||||
|
|
||||||
|
Neither half of that applies here:
|
||||||
|
|
||||||
|
- **Sampled images have no device-address path.** A texture *must* be
|
||||||
|
reached through a descriptor; there is no `OpConvertUToImage`. A
|
||||||
|
dynamic heap texture index cannot be rewritten into anything that
|
||||||
|
avoids dynamic descriptor selection.
|
||||||
|
- **There are many buffers, dynamically selected.** SSBOs *can* be
|
||||||
|
reached by address (`buffer_reference` / `OpConvertUToPtr`), but a
|
||||||
|
per-mesh array selected by `gl_InstanceCustomIndexEXT` would need the
|
||||||
|
engine to maintain and bind an address-table buffer and a SPIR-V
|
||||||
|
rewrite far larger than the single-TLAS AS case — and it would still
|
||||||
|
leave the texture half broken.
|
||||||
|
|
||||||
|
So the engine cannot paper over this the way it does the AS read. The
|
||||||
|
fix is on the **consumer** side: avoid dynamically selecting a
|
||||||
|
*descriptor* in a hit shader.
|
||||||
|
|
||||||
|
### Recommended pattern
|
||||||
|
|
||||||
|
The fault is dynamic selection of a **descriptor**. Indexing *within* a
|
||||||
|
single bound resource — an element offset into one SSBO, a layer into
|
||||||
|
one array texture — is ordinary memory / layer addressing and is
|
||||||
|
**unaffected**. So bind one resource and index inside it, rather than
|
||||||
|
indexing the heap:
|
||||||
|
|
||||||
|
- **Geometry** — pack all meshes' vertices/indices into a single SSBO
|
||||||
|
bound at a **spec-constant** slot and index it by a runtime element
|
||||||
|
offset, or reach each mesh's buffer via `buffer_reference` (a device
|
||||||
|
address loaded from one bound table). Either way the *descriptor* is
|
||||||
|
constant; only the offset/address is dynamic.
|
||||||
|
|
||||||
|
```glsl
|
||||||
|
// ❌ faults in a hit shader on NVIDIA: dynamic descriptor selection
|
||||||
|
layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];
|
||||||
|
Vertex vtx = vertexHeap[assetVertexStart + gl_InstanceCustomIndexEXT].v[i];
|
||||||
|
|
||||||
|
// ✅ one descriptor (spec-constant slot), dynamic element offset
|
||||||
|
layout(constant_id = 0) const uint16_t vertexSlot = 0us;
|
||||||
|
layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];
|
||||||
|
uint base = assetVertexStart[gl_InstanceCustomIndexEXT]; // from a bound SSBO
|
||||||
|
Vertex vtx = vertexHeap[vertexSlot].v[base + i];
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Materials / textures** — put them in one `texture2DArray` (or a small
|
||||||
|
number of arrays bucketed by format/size) bound at a spec-constant
|
||||||
|
slot and index by **layer**:
|
||||||
|
|
||||||
|
```glsl
|
||||||
|
// ✅ one array texture (spec-constant slot), dynamic layer index
|
||||||
|
layout(constant_id = 1) const uint16_t materialArraySlot = 0us;
|
||||||
|
layout(descriptor_heap) uniform sampler2DArray materials[];
|
||||||
|
uint layer = materialLayer[gl_InstanceCustomIndexEXT]; // from a bound SSBO
|
||||||
|
vec3 albedo = texture(materials[materialArraySlot], vec3(uv, layer)).rgb;
|
||||||
|
```
|
||||||
|
|
||||||
|
This is precisely what the WebGPU path already does — bucketed texture
|
||||||
|
arrays plus a single geometry buffer — so it is a proven, cross-backend
|
||||||
|
pattern, and it sidesteps the NVIDIA RT fault on the native path.
|
||||||
|
|
||||||
|
Remove this section once a fixed NVIDIA driver ships and dynamic
|
||||||
|
`descriptor_heap` indexing in hit shaders stops faulting.
|
||||||
|
|
||||||
## Asset fetch
|
## Asset fetch
|
||||||
|
|
||||||
`project.cpp` calls `Crafter::GitFetch(...)` on
|
`project.cpp` calls `Crafter::GitFetch(...)` on
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue