PipelineRTWebGPU emits a runIntersection mega-switch and the
RT_HAS_ANYHIT / RT_HAS_INTERSECTION consts (+ the @CRAFTER_RT_TRACE_USER
marker) that gate the library's new TRACE-stage user callbacks, so an
opaque triangle-only scene still const-folds them away. Mesh-WebGPU
builds a SAH BVH2 over AABB primitives and uploads them in primitive
order for the intersection shader to fetch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A 1-D indirect dispatch of ceil(W*H/64) workgroups for the wavefront
TRACE/SHADE stages overflows maxComputeWorkgroupsPerDimension (65535 on
Dawn/Firefox) once the surface exceeds ~4.19M rays (~2560x1640). Per the
WebGPU spec such a dispatch is silently dropped — no validation error —
so at 4K the world is never traced and the accumulator stays black while
non-RT passes survive.
_wfPrep now spreads the workgroups across a 2-D grid (x clamped to 65535,
y = ceil(wg/65535)), and the wfTrace/wfShade entry points rebuild the
linear ray index from (global_invocation_id, num_workgroups). The existing
`i >= _wfCurCount()` guard absorbs the grid overshoot. GENERATE/RESOLVE
already use a 2-D tile dispatch and are unchanged.
Verified in Firefox/WebGPU with RTStress at a 3449x1739 surface (5.99M
rays, 93716 workgroups — well over the 65535 cap): renders the full cube
grid where master shows a black screen.
Resolves#11
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two Vulkan validation errors fired on startup of every native (Vulkan)
example, reported in #5:
1. vkCreateDevice enabledLayerCount != 0. Device layers are deprecated
and ignored since Vulkan 1.0; passing them is a spec violation
(VUID-VkDeviceCreateInfo-enabledLayerCount-12384). The device-layer
enumeration/match block in Device::Initialize is removed and
enabledLayerCount is pinned to 0 — layers are enabled at the instance
only.
2. vkQueueSubmit layout transition on a presentable image that "has not
been acquired". StartInit() and RecreateSwapchainAndImages() eagerly
transitioned every swapchain image UNDEFINED -> PRESENT_SRC_KHR before
any vkAcquireNextImageKHR, which the spec forbids (a presentable image
may only be touched after acquire). Those pre-transitions are removed.
Each image's first layout transition now happens lazily in Render(),
after acquire, from UNDEFINED; subsequent frames transition from
PRESENT_SRC_KHR. A per-image `imageInitialised` flag (reset in
CreateSwapchain) selects the correct oldLayout.
Verified under sway (headless, GPU renderer) + VK_LAYER_KHRONOS_validation:
the original code reproduces both errors on HelloUI; the fixed build emits
zero validation messages across initial render and swapchain recreation.
Resolves#5
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the megakernel @compute entry with five wavefront kernels sharing
one module, connected by GPU ray/hit/payload buffers and a GPU-driven
indirect bounce loop:
GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE
- TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only).
- PREP publishes dispatchWorkgroupsIndirect args from the live ray count;
the indirect-args buffer lives in its own bind group so it is never
bound read-write in the same dispatch that consumes it as INDIRECT.
- New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate,
plus an optional user Resolve stage (tonemap hook; identity by default).
- Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary
between passes within one submit).
- Payload-typed wfPayload binding emitted in the codegen region after the
user's struct Payload; payload travels with each ray (2*W*H slots).
- Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups
PerDimension so the W*H-sized work buffers fit past the 128MB baseline.
VulkanTriangle ported to the new API and renders bit-identical to the
megakernel baseline at maxDepth=1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>