Crafter.Graphics/examples/Sponza/README.md

# Sponza example

Loads the Sponza atrium as a `.cmesh` + one albedo `.ctex` and renders
it via ray tracing on both Vulkan (native) and WebGPU (wasm). Same
`main.cpp`, `#ifdef CRAFTER_GRAPHICS_WINDOW_DOM` selects the backend.

## What this example proves

- `.cmesh` and `.ctex` decompression round-trip on both backends
  (GPU via `VK_EXT_memory_decompression` on Vulkan, CPU via
  `Compression::DecompressCPU` on WebGPU).
- A single texture binding flowing from `Image2D<RGBA8>` through the
  RT pipeline's closest-hit on both backends. The closest-hit samples
  at the barycentric attribs as UVs — proof-of-binding, not visually
  accurate. Per-vertex UV interpolation is the next step.

The closest-hit reads its texture through a **spec constant**
(`albedo[albedoSlot]`), not a runtime index. That is deliberate — see
below.

## Native RT limitation: dynamic `descriptor_heap` indexing in hit shaders

On NVIDIA driver `610.43.02` (Vulkan 1.4), indexing a
`layout(descriptor_heap)` array with a **runtime (non-constant)** index
inside a ray-tracing **hit** shader aborts the device with
`VK_ERROR_DEVICE_LOST` (an instruction-pointer / `READ_INVALID`
device-fault) with validation off. GPU-Assisted Validation masks it —
the scene runs fine under GPU-AV — which is why a validated run doesn't
catch it. It is a **driver-side fault**, the same family as the
descriptor-heap AS-read fault (#7 / #15) and the RT recursion / compute
TLAS-push issues (#21 / #22), but here for plain **SSBO and
sampled-image** descriptors read with a non-constant heap index
(issue #23).

### What was isolated (NVIDIA RTX 4090, driver `610.43.02`)

Driving a native bindless RT scene headlessly and bisecting the
closest-hit:

- A closest-hit that reads only `lightHeap[lightSlot]` where `lightSlot`
  is a **spec constant** survives indefinitely. ✅ (This example's
  `albedo[albedoSlot]` is exactly this case.)
- Reading `indexHeap[assetIndexStart + gl_InstanceCustomIndexEXT]` /
  `vertexHeap[...]` — a heap index offset by a **runtime** value —
  device-losts on the first geometry hit. ❌
- Reading a **texture** dynamically,
  `textureHeap[assetColorStart + gl_InstanceCustomIndexEXT]`, also
  device-losts. ❌ So it is SSBO *and* sampled-image descriptors.
- `nonuniformEXT()` on the dynamic index does **not** help.
- The identical dynamic-heap-index pattern works fine in **fragment**
  shaders (the UI renderer indexes `uiTextures[]` / `ui*Heap[]` by
  per-item runtime slots), so this is **RT-stage-specific**, not a
  general `descriptor_heap` problem.
- Reading a spec-constant-indexed SSBO in **raygen** works; only the
  *dynamic* index in the hit stage faults.

### Why there is no transparent engine workaround

The AS-read fault (#15) is worked around transparently because an
acceleration structure can be reached two ways: through a descriptor, or
through its **device address** via
`OpConvertUToAccelerationStructureKHR` (which reads no descriptor). There
is exactly one TLAS, so the engine rewrites the heap AS read into an
address load and feeds the address in as push data.

Neither half of that applies here:

- **Sampled images have no device-address path.** A texture *must* be
  reached through a descriptor; there is no `OpConvertUToImage`. A
  dynamic heap texture index cannot be rewritten into anything that
  avoids dynamic descriptor selection.
- **There are many buffers, dynamically selected.** SSBOs *can* be
  reached by address (`buffer_reference` / `OpConvertUToPtr`), but a
  per-mesh array selected by `gl_InstanceCustomIndexEXT` would need the
  engine to maintain and bind an address-table buffer and a SPIR-V
  rewrite far larger than the single-TLAS AS case — and it would still
  leave the texture half broken.

So the engine cannot paper over this the way it does the AS read. The
fix is on the **consumer** side: avoid dynamically selecting a
*descriptor* in a hit shader.

### Recommended pattern

The fault is dynamic selection of a **descriptor**. Indexing *within* a
single bound resource — an element offset into one SSBO, a layer into
one array texture — is ordinary memory / layer addressing and is
**unaffected**. So bind one resource and index inside it, rather than
indexing the heap:

- **Geometry** — pack all meshes' vertices/indices into a single SSBO
  bound at a **spec-constant** slot and index it by a runtime element
  offset, or reach each mesh's buffer via `buffer_reference` (a device
  address loaded from one bound table). Either way the *descriptor* is
  constant; only the offset/address is dynamic.

  ```glsl
  // ❌ faults in a hit shader on NVIDIA: dynamic descriptor selection
  layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];
  Vertex vtx = vertexHeap[assetVertexStart + gl_InstanceCustomIndexEXT].v[i];

  // ✅ one descriptor (spec-constant slot), dynamic element offset
  layout(constant_id = 0) const uint16_t vertexSlot = 0us;
  layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];
  uint base = assetVertexStart[gl_InstanceCustomIndexEXT]; // from a bound SSBO
  Vertex vtx = vertexHeap[vertexSlot].v[base + i];
  ```

- **Materials / textures** — put them in one `texture2DArray` (or a small
  number of arrays bucketed by format/size) bound at a spec-constant
  slot and index by **layer**:

  ```glsl
  // ✅ one array texture (spec-constant slot), dynamic layer index
  layout(constant_id = 1) const uint16_t materialArraySlot = 0us;
  layout(descriptor_heap) uniform sampler2DArray materials[];
  uint layer = materialLayer[gl_InstanceCustomIndexEXT]; // from a bound SSBO
  vec3 albedo = texture(materials[materialArraySlot], vec3(uv, layer)).rgb;
  ```

This is precisely what the WebGPU path already does — bucketed texture
arrays plus a single geometry buffer — so it is a proven, cross-backend
pattern, and it sidesteps the NVIDIA RT fault on the native path.

Remove this section once a fixed NVIDIA driver ships and dynamic
`descriptor_heap` indexing in hit shaders stops faulting.

## Asset fetch

`project.cpp` calls `Crafter::GitFetch(...)` on
[https://github.com/jimmiebergmann/Sponza](https://github.com/jimmiebergmann/Sponza)
(pinned to commit `222338979d32f4f4818466291bdbc29f192b86ba`). The
clone lands in the per-user crafter-build cache; first build pulls
~280 MB once, subsequent builds reuse it.

`cfg.assets` then picks two files out of that clone:

| Source                                  | Compressed output       |
|-----------------------------------------|-------------------------|
| `sponza.obj`                            | `sponza.cmesh`          |
| `textures/sponza_arch_diff.tga`         | `sponza_arch_diff.ctex` |

Both land flat in the example's bin directory.

## Building

```
crafter build                          # native Vulkan
crafter build --target=wasm32-wasip1   # WebGPU / wasm
```

## License & attribution

Sponza geometry, materials, and textures are licensed under
[CC BY 3.0](https://creativecommons.org/licenses/by/3.0/).

- **Original model:** Frank Meinl, Crytek (2010).
- **OBJ packaging / cleanup:** Morgan McGuire, McGuire Computer
  Graphics Archive — https://casual-effects.com/data.
- **GitHub mirror used here:** Jimmie Bergmann's roof-material fixup —
  https://github.com/jimmiebergmann/Sponza.

When redistributing builds of this example that bundle the compressed
Sponza outputs (`*.cmesh`, `*.ctex`), the CC BY 3.0 attribution
requirement applies. Quoting the original credit somewhere visible to
end users (about-screen, credits page, etc.) is enough.

The Crafter.Graphics library code itself is LGPL-3.0; the two
licenses are compatible for data + code distribution.
webgpu sponza 2026-05-19 00:27:09 +02:00			`# Sponza example`

			Loads the Sponza atrium as a `.cmesh` + one albedo `.ctex` and renders
			`it via ray tracing on both Vulkan (native) and WebGPU (wasm). Same`
			`main.cpp`, `#ifdef CRAFTER_GRAPHICS_WINDOW_DOM` selects the backend.

			`## What this example proves`

			- `.cmesh` and `.ctex` decompression round-trip on both backends
			(GPU via `VK_EXT_memory_decompression` on Vulkan, CPU via
			`Compression::DecompressCPU` on WebGPU).
			- A single texture binding flowing from `Image2D<RGBA8>` through the
			`RT pipeline's closest-hit on both backends. The closest-hit samples`
			`at the barycentric attribs as UVs — proof-of-binding, not visually`
			`accurate. Per-vertex UV interpolation is the next step.`

docs(vulkan-rt): document dynamic descriptor_heap-index hit-shader fault (#23) Indexing a `layout(descriptor_heap)` array with a runtime (non-constant) index inside a ray-tracing hit shader device-losts on NVIDIA 610.43.02, for both SSBO and sampled-image descriptors. A constant/spec-constant index is fine, and the same dynamic pattern works in fragment shaders, so it's an RT-stage-specific driver fault — the same family as #7/#15 (descriptor-heap AS reads) and #21/#22 (RT recursion + compute TLAS push). Unlike the AS-read fault, this cannot be worked around transparently: a sampled image has no device-address escape hatch the way an acceleration structure does (OpConvertUToAccelerationStructureKHR), and a buffer-only buffer_reference rewrite would need a whole address-table architecture while still leaving the texture half broken. So the resolution is the documented-limitation path (the precedent set by #7). Records the fault and its isolation in README's Native RT status and in the Sponza example README (the textured-closest-hit example, which already reads its albedo through a spec-constant slot for exactly this reason). Documents the recommended consumer pattern: bind one resource and index within it dynamically (single geometry SSBO / buffer_reference at a spec-constant slot; one texture2DArray indexed by layer) rather than selecting a descriptor dynamically — what the WebGPU path already does. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-03 20:05:12 +00:00			`The closest-hit reads its texture through a spec constant`
			(`albedo[albedoSlot]`), not a runtime index. That is deliberate — see
			`below.`

			## Native RT limitation: dynamic `descriptor_heap` indexing in hit shaders

			On NVIDIA driver `610.43.02` (Vulkan 1.4), indexing a
			`layout(descriptor_heap)` array with a runtime (non-constant) index
			`inside a ray-tracing hit shader aborts the device with`
			`VK_ERROR_DEVICE_LOST` (an instruction-pointer / `READ_INVALID`
			`device-fault) with validation off. GPU-Assisted Validation masks it —`
			`the scene runs fine under GPU-AV — which is why a validated run doesn't`
			`catch it. It is a driver-side fault, the same family as the`
			`descriptor-heap AS-read fault (#7 / #15) and the RT recursion / compute`
			`TLAS-push issues (#21 / #22), but here for plain **SSBO and`
			`sampled-image** descriptors read with a non-constant heap index`
			`(issue #23).`

			### What was isolated (NVIDIA RTX 4090, driver `610.43.02`)

			`Driving a native bindless RT scene headlessly and bisecting the`
			`closest-hit:`

			- A closest-hit that reads only `lightHeap[lightSlot]` where `lightSlot`
			`is a spec constant survives indefinitely. ✅ (This example's`
			`albedo[albedoSlot]` is exactly this case.)
			- Reading `indexHeap[assetIndexStart + gl_InstanceCustomIndexEXT]` /
			`vertexHeap[...]` — a heap index offset by a runtime value —
			`device-losts on the first geometry hit. ❌`
			`- Reading a texture dynamically,`
			`textureHeap[assetColorStart + gl_InstanceCustomIndexEXT]`, also
			`device-losts. ❌ So it is SSBO and sampled-image descriptors.`
			- `nonuniformEXT()` on the dynamic index does not help.
			`- The identical dynamic-heap-index pattern works fine in fragment`
			shaders (the UI renderer indexes `uiTextures[]` / `ui*Heap[]` by
			`per-item runtime slots), so this is RT-stage-specific, not a`
			general `descriptor_heap` problem.
			`- Reading a spec-constant-indexed SSBO in raygen works; only the`
			`dynamic index in the hit stage faults.`

			`### Why there is no transparent engine workaround`

			`The AS-read fault (#15) is worked around transparently because an`
			`acceleration structure can be reached two ways: through a descriptor, or`
			`through its device address via`
			`OpConvertUToAccelerationStructureKHR` (which reads no descriptor). There
			`is exactly one TLAS, so the engine rewrites the heap AS read into an`
			`address load and feeds the address in as push data.`

			`Neither half of that applies here:`

			`- Sampled images have no device-address path. A texture must be`
			reached through a descriptor; there is no `OpConvertUToImage`. A
			`dynamic heap texture index cannot be rewritten into anything that`
			`avoids dynamic descriptor selection.`
			`- There are many buffers, dynamically selected. SSBOs can be`
			reached by address (`buffer_reference` / `OpConvertUToPtr`), but a
			per-mesh array selected by `gl_InstanceCustomIndexEXT` would need the
			`engine to maintain and bind an address-table buffer and a SPIR-V`
			`rewrite far larger than the single-TLAS AS case — and it would still`
			`leave the texture half broken.`

			`So the engine cannot paper over this the way it does the AS read. The`
			`fix is on the consumer side: avoid dynamically selecting a`
			`descriptor in a hit shader.`

			`### Recommended pattern`

			`The fault is dynamic selection of a descriptor. Indexing within a`
			`single bound resource — an element offset into one SSBO, a layer into`
			`one array texture — is ordinary memory / layer addressing and is`
			`unaffected. So bind one resource and index inside it, rather than`
			`indexing the heap:`

			`- Geometry — pack all meshes' vertices/indices into a single SSBO`
			`bound at a spec-constant slot and index it by a runtime element`
			offset, or reach each mesh's buffer via `buffer_reference` (a device
			`address loaded from one bound table). Either way the descriptor is`
			`constant; only the offset/address is dynamic.`

			```glsl
			`// ❌ faults in a hit shader on NVIDIA: dynamic descriptor selection`
			`layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];`
			`Vertex vtx = vertexHeap[assetVertexStart + gl_InstanceCustomIndexEXT].v[i];`

			`// ✅ one descriptor (spec-constant slot), dynamic element offset`
			`layout(constant_id = 0) const uint16_t vertexSlot = 0us;`
			`layout(descriptor_heap) buffer Verts { Vertex v[]; } vertexHeap[];`
			`uint base = assetVertexStart[gl_InstanceCustomIndexEXT]; // from a bound SSBO`
			`Vertex vtx = vertexHeap[vertexSlot].v[base + i];`
			```

			- Materials / textures — put them in one `texture2DArray` (or a small
			`number of arrays bucketed by format/size) bound at a spec-constant`
			`slot and index by layer:`

			```glsl
			`// ✅ one array texture (spec-constant slot), dynamic layer index`
			`layout(constant_id = 1) const uint16_t materialArraySlot = 0us;`
			`layout(descriptor_heap) uniform sampler2DArray materials[];`
			`uint layer = materialLayer[gl_InstanceCustomIndexEXT]; // from a bound SSBO`
			`vec3 albedo = texture(materials[materialArraySlot], vec3(uv, layer)).rgb;`
			```

			`This is precisely what the WebGPU path already does — bucketed texture`
			`arrays plus a single geometry buffer — so it is a proven, cross-backend`
			`pattern, and it sidesteps the NVIDIA RT fault on the native path.`

			`Remove this section once a fixed NVIDIA driver ships and dynamic`
			`descriptor_heap` indexing in hit shaders stops faulting.

webgpu sponza 2026-05-19 00:27:09 +02:00			`## Asset fetch`

			`project.cpp` calls `Crafter::GitFetch(...)` on
			`[https://github.com/jimmiebergmann/Sponza](https://github.com/jimmiebergmann/Sponza)`
			(pinned to commit `222338979d32f4f4818466291bdbc29f192b86ba`). The
			`clone lands in the per-user crafter-build cache; first build pulls`
			`~280 MB once, subsequent builds reuse it.`

			`cfg.assets` then picks two files out of that clone:

			`\| Source \| Compressed output \|`
			`\|-----------------------------------------\|-------------------------\|`
			\| `sponza.obj` \| `sponza.cmesh` \|
			\| `textures/sponza_arch_diff.tga` \| `sponza_arch_diff.ctex` \|

			`Both land flat in the example's bin directory.`

			`## Building`

			```
			`crafter build # native Vulkan`
			`crafter build --target=wasm32-wasip1 # WebGPU / wasm`
			```

			`## License & attribution`

			`Sponza geometry, materials, and textures are licensed under`
			`[CC BY 3.0](https://creativecommons.org/licenses/by/3.0/).`

			`- Original model: Frank Meinl, Crytek (2010).`
			`- OBJ packaging / cleanup: Morgan McGuire, McGuire Computer`
			`Graphics Archive — https://casual-effects.com/data.`
			`- GitHub mirror used here: Jimmie Bergmann's roof-material fixup —`
			`https://github.com/jimmiebergmann/Sponza.`

			`When redistributing builds of this example that bundle the compressed`
			Sponza outputs (`.cmesh`, `.ctex`), the CC BY 3.0 attribution
			`requirement applies. Quoting the original credit somewhere visible to`
			`end users (about-screen, credits page, etc.) is enough.`

			`The Crafter.Graphics library code itself is LGPL-3.0; the two`
			`licenses are compatible for data + code distribution.`