Commit graph

137 commits

Author SHA1 Message Date
catbot
1c310762a7 fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21)
Two gaps in the Vulkan RT path that fault the device on the NVIDIA
proprietary driver with a non-trivial pipeline (simple VulkanTriangle
never hit them):

1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit
   shader that traces a secondary ray (shadow ray — a very common
   pattern) recursed past the pipeline limit (UB → device fault).
   PipelineRTVulkan::Init now takes a maxRecursionDepth parameter
   (default 1, clamped to the device's maxRayRecursionDepth).

2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader
   that reads an accelerationStructureEXT from the heap — including
   compute shaders — to read the TLAS device address from a push
   constant, but only RTPass pushed that address. A compute shader that
   ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten
   push slot → garbage AS handle → VK_ERROR_DEVICE_LOST.

   WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult
   {patched, tlasPushOffset} instead of writing the clobber-prone global
   Device::workaroundTlasPushOffset (removed). VulkanShader stores it;
   ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and
   ComputeShader tracks its own offset and pushes the caller-supplied
   TLAS address in Dispatch (new defaulted tlasAddress parameter),
   mirroring RTPass::Record.

The PushConstantRewrite regression test now asserts Patch's returned
patched/offset and adds two ray-querying compute-shader cases, proving
the rewrite is stage-agnostic and the per-shader offset is correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 18:35:39 +00:00
catbot
e7469133e8 feat(vulkan): re-enable GPU-Assisted Validation
The GPU-AV enable list was removed to dodge a crash in SDK 1.4.341,
whose GPU-AV null-deref'd on descriptor_heap pipelines
(VK_PIPELINE_CREATE_2_DESCRIPTOR_HEAP_BIT_EXT, layout = VK_NULL_HANDLE)
in PipelineSubState::GetPipelineLayoutUnion:
  https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/12103

That was fixed in the next SDK release. The validation layer is now
1.4.350 (> 1.4.341), so restore VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT
in the VkValidationFeaturesEXT enable list.

Verified by running the HelloUI example (which draws through the
descriptor_heap compute pipelines) with the layer active: it renders the
full UI for the entire run with GPU-AV reporting "Both GPU Assisted
Validation and Normal Core Check Validation are enabled" and no
descriptor-heap null-deref or VUID errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 02:10:23 +00:00
catbot
950059c86e fix(vulkan-rt): work around NVIDIA descriptor-heap AS-read device-loss (#15)
Reading an acceleration structure through VK_EXT_descriptor_heap aborts
with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 — a brand-new-extension
driver fault isolated in #7 (engine setup is correct and validation-clean;
images/buffers through the same heap work, and both traceRayEXT and inline
rayQuery fault identically on the AS read).

An acceleration structure can equally be reached by its device address via
OpConvertUToAccelerationStructureKHR, which reads no descriptor and so never
touches the faulting heap path. glslang has no GLSL spelling for that
conversion, so VulkanShader rewrites the compiled SPIR-V at module-load
time: every `OpLoad %accelStruct <heap-ptr>` becomes a load of the TLAS
device address from a synthesized push-constant block followed by the
convert. RTPass pushes the active frame's TLAS address into that push
constant. User GLSL and example code are unchanged; acceleration structures
still bind into the heap normally.

The workaround is gated on Device::workaroundDescriptorHeapAS (true only on
the NVIDIA proprietary driver) and confined to one fenced block in
Crafter.Graphics-ShaderVulkan.cppm plus the RTPass push and the shaderInt64
feature toggle — delete those once a fixed NVIDIA driver ships and the heap
AS read becomes the direct path again.

Verified: VulkanTriangle ray-traces correctly on native NVIDIA (RTX 4090),
validation-layer-clean, no device loss. The SPIR-V rewrite was independently
validated with spirv-val on both the VulkanTriangle and Sponza raygen
modules.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 01:59:54 +00:00
catbot
a91603c70b feat(webgpu-rt): emit intersection/any-hit dispatch + build AABB BVH
PipelineRTWebGPU emits a runIntersection mega-switch and the
RT_HAS_ANYHIT / RT_HAS_INTERSECTION consts (+ the @CRAFTER_RT_TRACE_USER
marker) that gate the library's new TRACE-stage user callbacks, so an
opaque triangle-only scene still const-folds them away. Mesh-WebGPU
builds a SAH BVH2 over AABB primitives and uploads them in primitive
order for the intersection shader to fetch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 22:09:20 +00:00
catbot
1e749818ef fix(webgpu): reshape wavefront TRACE/SHADE to 2-D to survive >4.19M rays
A 1-D indirect dispatch of ceil(W*H/64) workgroups for the wavefront
TRACE/SHADE stages overflows maxComputeWorkgroupsPerDimension (65535 on
Dawn/Firefox) once the surface exceeds ~4.19M rays (~2560x1640). Per the
WebGPU spec such a dispatch is silently dropped — no validation error —
so at 4K the world is never traced and the accumulator stays black while
non-RT passes survive.

_wfPrep now spreads the workgroups across a 2-D grid (x clamped to 65535,
y = ceil(wg/65535)), and the wfTrace/wfShade entry points rebuild the
linear ray index from (global_invocation_id, num_workgroups). The existing
`i >= _wfCurCount()` guard absorbs the grid overshoot. GENERATE/RESOLVE
already use a 2-D tile dispatch and are unchanged.

Verified in Firefox/WebGPU with RTStress at a 3449x1739 surface (5.99M
rays, 93716 workgroups — well over the 65535 cap): renders the full cube
grid where master shows a black screen.

Resolves #11

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 11:09:15 +00:00
catbot
cac433ee09 fix(vulkan): clear startup validation errors on native triangle
Two Vulkan validation errors fired on startup of every native (Vulkan)
example, reported in #5:

1. vkCreateDevice enabledLayerCount != 0. Device layers are deprecated
   and ignored since Vulkan 1.0; passing them is a spec violation
   (VUID-VkDeviceCreateInfo-enabledLayerCount-12384). The device-layer
   enumeration/match block in Device::Initialize is removed and
   enabledLayerCount is pinned to 0 — layers are enabled at the instance
   only.

2. vkQueueSubmit layout transition on a presentable image that "has not
   been acquired". StartInit() and RecreateSwapchainAndImages() eagerly
   transitioned every swapchain image UNDEFINED -> PRESENT_SRC_KHR before
   any vkAcquireNextImageKHR, which the spec forbids (a presentable image
   may only be touched after acquire). Those pre-transitions are removed.
   Each image's first layout transition now happens lazily in Render(),
   after acquire, from UNDEFINED; subsequent frames transition from
   PRESENT_SRC_KHR. A per-image `imageInitialised` flag (reset in
   CreateSwapchain) selects the correct oldLayout.

Verified under sway (headless, GPU renderer) + VK_LAYER_KHRONOS_validation:
the original code reproduces both errors on HelloUI; the fixed build emits
zero validation messages across initial render and swapchain recreation.

Resolves #5

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:59:10 +00:00
catbot
4e42d663a6 WebGPU RT: wavefront tracer core (GENERATE/PREP/TRACE/SHADE/RESOLVE)
Replace the megakernel @compute entry with five wavefront kernels sharing
one module, connected by GPU ray/hit/payload buffers and a GPU-driven
indirect bounce loop:

  GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE

- TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only).
- PREP publishes dispatchWorkgroupsIndirect args from the live ray count;
  the indirect-args buffer lives in its own bind group so it is never
  bound read-write in the same dispatch that consumes it as INDIRECT.
- New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate,
  plus an optional user Resolve stage (tonemap hook; identity by default).
- Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary
  between passes within one submit).
- Payload-typed wfPayload binding emitted in the codegen region after the
  user's struct Payload; payload travels with each ray (2*W*H slots).
- Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups
  PerDimension so the W*H-sized work buffers fit past the 128MB baseline.

VulkanTriangle ported to the new API and renders bit-identical to the
megakernel baseline at maxDepth=1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 16:24:41 +00:00
909a9b46d2 wasm fixes 2026-05-26 22:50:49 +02:00
8347467e1e webgpu improvements 2026-05-24 13:32:08 +02:00
850ef7bfb3 clipboard 2026-05-19 00:45:22 +02:00
b5d0f52da0 webgpu sponza 2026-05-19 00:27:09 +02:00
5553ded476 webgpu triangle 2026-05-18 18:43:30 +02:00
64116cd980 custom shader webgpu 2026-05-18 05:39:17 +02:00
dedf6b0467 webgpu support 2026-05-18 04:58:52 +02:00
5352ef69a2 browser DOM support 2026-05-18 02:07:48 +02:00
ac2eb7fb0a new input system 2026-05-12 00:24:48 +02:00
b3db40ebec update 2026-05-05 23:49:29 +02:00
825da78f7f descriptor heap leak fix 2026-05-05 00:02:04 +02:00
c054f1e0b3 update 2026-05-03 02:45:38 +02:00
1f5697326c UI rewrite 3rd attempt 2026-05-02 21:08:20 +02:00
c9fd1b1585 animated example 2026-05-02 00:03:24 +02:00
216972e73a new UI system 2026-05-01 23:35:37 +02:00
d840a81448 bugfixes 2026-04-30 23:15:43 +02:00
c9ebd448f9 update 2026-04-16 23:03:24 +02:00
5ffe1404fc vulkan2d fixes 2026-04-13 18:36:07 +02:00
4c93c5535e typo 2026-04-11 23:22:52 +02:00
ea18f32300 vulkan2d fixes 2026-04-11 23:18:41 +02:00
1c1a142f52 rendertargetvulkan 2026-04-11 18:48:00 +02:00
3fcea6a3d7 writing ui descriptors 2026-04-10 22:26:15 +02:00
177f873639 vulkan UI 2026-04-09 00:15:09 +02:00
f8e142fb06 descriptor heap rewrite 2026-04-05 22:53:59 +02:00
b4bd0c03c5 fix 2026-04-03 03:29:51 +02:00
22b8af7bfc update 2026-04-02 16:52:10 +02:00
477b7dd087 F16 rendering 2026-04-01 18:43:18 +02:00
c895c266fb vector renderring 2026-03-31 15:22:55 +02:00
80bb04f84a revert 2026-03-24 05:25:53 +01:00
bc97c13a0b rendertarget multi frame rewrite complete 2026-03-13 01:06:55 +01:00
2b22c16ce7 rendertarget multi frame rewrite 2026-03-12 21:13:53 +01:00
7f46ac13fa rendering improvements 2026-03-12 01:07:46 +01:00
789bb307d5 render target improvements 2026-03-10 22:32:50 +01:00
LancenShield
38c8d57146 Added VK_CONTROL for same reason as VK_SHIFT. 2026-03-09 22:34:14 -05:00
103a35ee40 vk shift crash 2026-03-10 03:24:01 +01:00
123bb31f50 initalization 2026-03-10 03:05:10 +01:00
accc9ef8e7 typo 2026-03-10 02:47:28 +01:00
6effe88733 color format fix 2026-03-09 23:21:24 +01:00
cfebc896db color format fix 2026-03-09 23:21:10 +01:00
486775925b color format fix 2026-03-09 23:12:31 +01:00
1ff646ca01 color format fix 2026-03-09 23:05:32 +01:00
dd319b30c3 win32 fix 2026-03-09 22:53:26 +01:00
5fc7423387 time fix 2026-03-09 22:15:36 +01:00