Investigated the VK_ERROR_DEVICE_LOST on the native VulkanTriangle (#7).
Verified the engine side is correct and validation-clean: the BLAS/TLAS
build finishes before render (FinishInit waits), the built instance is
well-formed (identity transform, mask=0xFF, correct BLAS ref), and
vkWriteResourceDescriptorsEXT stores the TLAS device address at the
expected heap offset (confirmed by dumping the heap bytes). Khronos
validation 1.4.350 reports zero errors.
The fault is isolated to reading the acceleration structure through
VK_EXT_descriptor_heap:
- images/buffers via the same heap render fine (trace disabled -> the
raygen imageStore path renders a full gradient);
- both traceRayEXT and inline rayQueryEXT (no SBT) fault identically on
the AS read;
- reproduces with the AS descriptor at heap byte 0 / shader index 0 (no
offset/stride ambiguity) and regardless of pAddressRange size.
NVIDIA 610.43.02 is the only descriptor_heap implementation available
(llvmpipe lacks the extension), so there is no second implementation to
cross-check. Conclusion: driver-side fault in NVIDIA's brand-new
VK_EXT_descriptor_heap acceleration-structure path; should be reported to
NVIDIA. The traceRayEXT call is left active so the example stays a
faithful reproducer. Documented in both READMEs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restructure Sponza for the wavefront model: raygen emits the primary ray;
closesthit (in SHADE) gathers albedo/normal, accumulates ambient, and
emits a shadow ray carrying the pending direct term; miss adds the sky
(primary) or the direct term (shadow miss). resolve.wgsl applies the same
Reinhard+gamma the megakernel raygen did inline. User bindings moved to
group 3 (groups 0..2 reserved). RTPass maxDepth=2.
Renders the atrium correctly through the wavefront pipeline (textures,
two-sided shading, sun+ambient, shadows, tonemap).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the megakernel @compute entry with five wavefront kernels sharing
one module, connected by GPU ray/hit/payload buffers and a GPU-driven
indirect bounce loop:
GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE
- TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only).
- PREP publishes dispatchWorkgroupsIndirect args from the live ray count;
the indirect-args buffer lives in its own bind group so it is never
bound read-write in the same dispatch that consumes it as INDIRECT.
- New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate,
plus an optional user Resolve stage (tonemap hook; identity by default).
- Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary
between passes within one submit).
- Payload-typed wfPayload binding emitted in the codegen region after the
user's struct Payload; payload travels with each ray (2*W*H slots).
- Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups
PerDimension so the W*H-sized work buffers fit past the 128MB baseline.
VulkanTriangle ported to the new API and renders bit-identical to the
megakernel baseline at maxDepth=1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>