Merge pull request 'docs(vulkan-rt): native descriptor-heap AS read is an NVIDIA driver fault (#7)' (#10) from claude/issue-7 into master
This commit is contained in:
commit
afb9e320e1
3 changed files with 59 additions and 6 deletions
|
|
@ -24,6 +24,15 @@ Vulkan ray tracing is hardware (`VK_KHR_ray_tracing_pipeline`); WebGPU
|
|||
ray tracing is a library-built software path (BVH + traceRay in a
|
||||
compute pipeline composed from user-supplied WGSL stages).
|
||||
|
||||
> **Native RT status:** reading an acceleration structure through
|
||||
> `VK_EXT_descriptor_heap` currently aborts with `VK_ERROR_DEVICE_LOST` on
|
||||
> NVIDIA driver `610.43.02` — a driver-side fault in the brand-new
|
||||
> descriptor-heap acceleration-structure path, not an engine bug. The
|
||||
> engine setup (build, descriptors, SBT) is correct and validation-clean,
|
||||
> and images/buffers through the same heap work. See
|
||||
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
|
||||
> for the full investigation. WebGPU RT is unaffected.
|
||||
|
||||
## What's in here
|
||||
|
||||
- **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas
|
||||
|
|
|
|||
|
|
@ -28,11 +28,49 @@ cd examples/VulkanTriangle
|
|||
crafter-build -r
|
||||
```
|
||||
|
||||
You should see a 1280×720 window with a triangle filling roughly the
|
||||
centre.
|
||||
On a working driver you should see a 1280×720 window with a triangle
|
||||
filling roughly the centre. **On the current NVIDIA driver the native
|
||||
build aborts with `VK_ERROR_DEVICE_LOST` the moment `traceRayEXT` runs —
|
||||
see below.**
|
||||
|
||||
## Notes
|
||||
## Native status — known driver fault (`VK_ERROR_DEVICE_LOST`)
|
||||
|
||||
`raygen.glsl`'s `traceRayEXT` call is currently commented out — the
|
||||
example exercises the dispatch and `imageStore` paths only. Uncomment
|
||||
it to actually trace into the BLAS.
|
||||
On NVIDIA driver `610.43.02` (Vulkan 1.4) the native build aborts with
|
||||
`VK_ERROR_DEVICE_LOST` on the first frame as soon as the shader reads the
|
||||
acceleration structure. `VK_EXT_device_fault` reports an invalid GPU read
|
||||
(address `~0xffff…`) plus instruction-pointer faults inside the
|
||||
ray-tracing shader. Commenting out the `traceRayEXT` call makes the crash
|
||||
disappear (the dispatch + `imageStore` path renders a solid colour fine).
|
||||
|
||||
This was investigated thoroughly and traced to the **acceleration-structure
|
||||
read through `VK_EXT_descriptor_heap`**, *not* to the engine's RT setup:
|
||||
|
||||
- The BLAS/TLAS build is correct and finishes before rendering
|
||||
(`Window::FinishInit` does `vkQueueWaitIdle`). The built TLAS instance
|
||||
has an identity transform, `mask = 0xFF`, and the correct BLAS device
|
||||
address.
|
||||
- The AS descriptor is written correctly — `vkWriteResourceDescriptorsEXT`
|
||||
stores the TLAS device address at the expected heap byte offset (verified
|
||||
by dumping the raw heap bytes after the write).
|
||||
- The Khronos validation layers (1.4.350, current) report **zero** errors
|
||||
for the whole frame, including the SBT regions handed to
|
||||
`vkCmdTraceRaysKHR`.
|
||||
- Storage images and buffers bound through the **same** descriptor heap
|
||||
work — with `traceRayEXT` removed, the raygen shader's `imageStore`
|
||||
renders correctly, so the heap binding / image path is sound.
|
||||
- Both the ray-tracing pipeline (`traceRayEXT`) **and** inline ray query
|
||||
(`rayQueryEXT`, which uses no shader binding table) fault identically
|
||||
when they read the acceleration structure from the heap. That isolates
|
||||
the fault to the AS-via-heap read, not the SBT or the RT pipeline.
|
||||
- The fault reproduces even with the AS descriptor written at heap byte 0
|
||||
and read at shader index 0 (no descriptor offset/stride ambiguity), and
|
||||
is unaffected by the `pAddressRange` size.
|
||||
- `VK_EXT_descriptor_heap` is brand new; on this machine NVIDIA is the only
|
||||
implementation that advertises it (llvmpipe does not), so there is no
|
||||
second conformant implementation to cross-check against.
|
||||
|
||||
**Conclusion:** this is a driver-side fault in NVIDIA's
|
||||
`VK_EXT_descriptor_heap` acceleration-structure path, not an engine bug. It
|
||||
should be reported to NVIDIA. The `traceRayEXT` call is intentionally left
|
||||
in `raygen.glsl` so this stays a faithful one-file reproducer; the example
|
||||
will start rendering the triangle again once a fixed driver ships.
|
||||
|
|
|
|||
|
|
@ -201,6 +201,12 @@ int main() {
|
|||
RTPass rtPass(&pipeline);
|
||||
window.passes.push_back(&rtPass);
|
||||
|
||||
// NOTE: on NVIDIA 610.43.02 this aborts with VK_ERROR_DEVICE_LOST the
|
||||
// first time the raygen shader reads the acceleration structure out of
|
||||
// the VK_EXT_descriptor_heap. The build, descriptors and SBT are all
|
||||
// correct and validation-clean; it is a driver-side fault in the
|
||||
// descriptor-heap acceleration-structure path. See README.md
|
||||
// ("Native status — known driver fault") for the full investigation.
|
||||
window.Render();
|
||||
window.StartSync();
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue