Reading an acceleration structure through VK_EXT_descriptor_heap aborts
with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 — a brand-new-extension
driver fault isolated in #7 (engine setup is correct and validation-clean;
images/buffers through the same heap work, and both traceRayEXT and inline
rayQuery fault identically on the AS read).
An acceleration structure can equally be reached by its device address via
OpConvertUToAccelerationStructureKHR, which reads no descriptor and so never
touches the faulting heap path. glslang has no GLSL spelling for that
conversion, so VulkanShader rewrites the compiled SPIR-V at module-load
time: every `OpLoad %accelStruct <heap-ptr>` becomes a load of the TLAS
device address from a synthesized push-constant block followed by the
convert. RTPass pushes the active frame's TLAS address into that push
constant. User GLSL and example code are unchanged; acceleration structures
still bind into the heap normally.
The workaround is gated on Device::workaroundDescriptorHeapAS (true only on
the NVIDIA proprietary driver) and confined to one fenced block in
Crafter.Graphics-ShaderVulkan.cppm plus the RTPass push and the shaderInt64
feature toggle — delete those once a fixed NVIDIA driver ships and the heap
AS read becomes the direct path again.
Verified: VulkanTriangle ray-traces correctly on native NVIDIA (RTX 4090),
validation-layer-clean, no device loss. The SPIR-V rewrite was independently
validated with spirv-val on both the VulkanTriangle and Sponza raygen
modules.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PipelineRTWebGPU emits a runIntersection mega-switch and the
RT_HAS_ANYHIT / RT_HAS_INTERSECTION consts (+ the @CRAFTER_RT_TRACE_USER
marker) that gate the library's new TRACE-stage user callbacks, so an
opaque triangle-only scene still const-folds them away. Mesh-WebGPU
builds a SAH BVH2 over AABB primitives and uploads them in primitive
order for the intersection shader to fetch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A 1-D indirect dispatch of ceil(W*H/64) workgroups for the wavefront
TRACE/SHADE stages overflows maxComputeWorkgroupsPerDimension (65535 on
Dawn/Firefox) once the surface exceeds ~4.19M rays (~2560x1640). Per the
WebGPU spec such a dispatch is silently dropped — no validation error —
so at 4K the world is never traced and the accumulator stays black while
non-RT passes survive.
_wfPrep now spreads the workgroups across a 2-D grid (x clamped to 65535,
y = ceil(wg/65535)), and the wfTrace/wfShade entry points rebuild the
linear ray index from (global_invocation_id, num_workgroups). The existing
`i >= _wfCurCount()` guard absorbs the grid overshoot. GENERATE/RESOLVE
already use a 2-D tile dispatch and are unchanged.
Verified in Firefox/WebGPU with RTStress at a 3449x1739 surface (5.99M
rays, 93716 workgroups — well over the 65535 cap): renders the full cube
grid where master shows a black screen.
Resolves#11
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two Vulkan validation errors fired on startup of every native (Vulkan)
example, reported in #5:
1. vkCreateDevice enabledLayerCount != 0. Device layers are deprecated
and ignored since Vulkan 1.0; passing them is a spec violation
(VUID-VkDeviceCreateInfo-enabledLayerCount-12384). The device-layer
enumeration/match block in Device::Initialize is removed and
enabledLayerCount is pinned to 0 — layers are enabled at the instance
only.
2. vkQueueSubmit layout transition on a presentable image that "has not
been acquired". StartInit() and RecreateSwapchainAndImages() eagerly
transitioned every swapchain image UNDEFINED -> PRESENT_SRC_KHR before
any vkAcquireNextImageKHR, which the spec forbids (a presentable image
may only be touched after acquire). Those pre-transitions are removed.
Each image's first layout transition now happens lazily in Render(),
after acquire, from UNDEFINED; subsequent frames transition from
PRESENT_SRC_KHR. A per-image `imageInitialised` flag (reset in
CreateSwapchain) selects the correct oldLayout.
Verified under sway (headless, GPU renderer) + VK_LAYER_KHRONOS_validation:
the original code reproduces both errors on HelloUI; the fixed build emits
zero validation messages across initial render and swapchain recreation.
Resolves#5
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the megakernel @compute entry with five wavefront kernels sharing
one module, connected by GPU ray/hit/payload buffers and a GPU-driven
indirect bounce loop:
GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE
- TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only).
- PREP publishes dispatchWorkgroupsIndirect args from the live ray count;
the indirect-args buffer lives in its own bind group so it is never
bound read-write in the same dispatch that consumes it as INDIRECT.
- New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate,
plus an optional user Resolve stage (tonemap hook; identity by default).
- Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary
between passes within one submit).
- Payload-typed wfPayload binding emitted in the codegen region after the
user's struct Payload; payload travels with each ray (2*W*H slots).
- Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups
PerDimension so the W*H-sized work buffers fit past the 128MB baseline.
VulkanTriangle ported to the new API and renders bit-identical to the
megakernel baseline at maxDepth=1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>