Commit graph

116 commits

Author SHA1 Message Date
catbot
1c310762a7 fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21)
Two gaps in the Vulkan RT path that fault the device on the NVIDIA
proprietary driver with a non-trivial pipeline (simple VulkanTriangle
never hit them):

1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit
   shader that traces a secondary ray (shadow ray — a very common
   pattern) recursed past the pipeline limit (UB → device fault).
   PipelineRTVulkan::Init now takes a maxRecursionDepth parameter
   (default 1, clamped to the device's maxRayRecursionDepth).

2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader
   that reads an accelerationStructureEXT from the heap — including
   compute shaders — to read the TLAS device address from a push
   constant, but only RTPass pushed that address. A compute shader that
   ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten
   push slot → garbage AS handle → VK_ERROR_DEVICE_LOST.

   WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult
   {patched, tlasPushOffset} instead of writing the clobber-prone global
   Device::workaroundTlasPushOffset (removed). VulkanShader stores it;
   ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and
   ComputeShader tracks its own offset and pushes the caller-supplied
   TLAS address in Dispatch (new defaulted tlasAddress parameter),
   mirroring RTPass::Record.

The PushConstantRewrite regression test now asserts Patch's returned
patched/offset and adds two ray-querying compute-shader cases, proving
the rewrite is stage-agnostic and the per-shader offset is correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 18:35:39 +00:00
catbot
45ecc91424 fix(vulkan-rt): merge TLAS push constant into existing block (#18)
The NVIDIA descriptor-heap AS-read workaround (#15) rewrote heap
acceleration-structure reads into a load of the TLAS device address from
a push-constant block. It always *synthesized a new* push-constant block,
so any ray-tracing shader that already declared one ended up with two —
which SPIR-V forbids ("at most one push constant block statically used per
entry point"), and vkCreateShaderModule's spirv-val check rejected:

    Entry point id '4' uses more than one PushConstant interface.

WorkaroundNvidiaAS::Patch now detects an existing PushConstant variable and,
when present, appends a single ulong member (the TLAS address) to that
block instead of adding a second one, reading the address through the
shader's own push-constant variable. The append offset is the end of the
user's block, computed from the members' explicit Offset/ArrayStride/
MatrixStride decorations (correct under both scalar and std140 layout) and
rounded up to 8. Shaders with no push constant of their own keep getting a
freshly synthesized single-member block at offset 0, exactly as before.

That offset is published via Device::workaroundTlasPushOffset and RTPass
feeds it to vkCmdPushDataEXT so the address lands where the rewritten load
reads it (0 for the synthesized case, preserving prior behaviour).

Verified on the affected driver (NVIDIA 610.43.02, RTX 4090): VulkanTriangle
ray-traces correctly and validation-clean both with and without a
user-declared raygen push constant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 02:28:02 +00:00
catbot
950059c86e fix(vulkan-rt): work around NVIDIA descriptor-heap AS-read device-loss (#15)
Reading an acceleration structure through VK_EXT_descriptor_heap aborts
with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 — a brand-new-extension
driver fault isolated in #7 (engine setup is correct and validation-clean;
images/buffers through the same heap work, and both traceRayEXT and inline
rayQuery fault identically on the AS read).

An acceleration structure can equally be reached by its device address via
OpConvertUToAccelerationStructureKHR, which reads no descriptor and so never
touches the faulting heap path. glslang has no GLSL spelling for that
conversion, so VulkanShader rewrites the compiled SPIR-V at module-load
time: every `OpLoad %accelStruct <heap-ptr>` becomes a load of the TLAS
device address from a synthesized push-constant block followed by the
convert. RTPass pushes the active frame's TLAS address into that push
constant. User GLSL and example code are unchanged; acceleration structures
still bind into the heap normally.

The workaround is gated on Device::workaroundDescriptorHeapAS (true only on
the NVIDIA proprietary driver) and confined to one fenced block in
Crafter.Graphics-ShaderVulkan.cppm plus the RTPass push and the shaderInt64
feature toggle — delete those once a fixed NVIDIA driver ships and the heap
AS read becomes the direct path again.

Verified: VulkanTriangle ray-traces correctly on native NVIDIA (RTX 4090),
validation-layer-clean, no device loss. The SPIR-V rewrite was independently
validated with spirv-val on both the VulkanTriangle and Sponza raygen
modules.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 01:59:54 +00:00
catbot
321fe596a7 feat(webgpu-rt): add intersection stage, procedural hit group, AABB BLAS API
Extends the cross-backend RT type surface for procedural geometry +
any-hit on the WebGPU path:

- RTShaderGroupType::ProceduralHitGroup + RTShaderGroup::intersectionShader
  (mirror VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR).
- WebGPURTStage::Intersection for AABB intersection shaders.
- Mesh::BuildProcedural(span<RTAabb>, opaque) — the WebGPU analog of a
  VK_GEOMETRY_TYPE_AABBS_KHR geometry.
- wgpuRegisterMeshBLAS gains geomType / opaqueFlag / primCount.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 22:09:14 +00:00
catbot
cac433ee09 fix(vulkan): clear startup validation errors on native triangle
Two Vulkan validation errors fired on startup of every native (Vulkan)
example, reported in #5:

1. vkCreateDevice enabledLayerCount != 0. Device layers are deprecated
   and ignored since Vulkan 1.0; passing them is a spec violation
   (VUID-VkDeviceCreateInfo-enabledLayerCount-12384). The device-layer
   enumeration/match block in Device::Initialize is removed and
   enabledLayerCount is pinned to 0 — layers are enabled at the instance
   only.

2. vkQueueSubmit layout transition on a presentable image that "has not
   been acquired". StartInit() and RecreateSwapchainAndImages() eagerly
   transitioned every swapchain image UNDEFINED -> PRESENT_SRC_KHR before
   any vkAcquireNextImageKHR, which the spec forbids (a presentable image
   may only be touched after acquire). Those pre-transitions are removed.
   Each image's first layout transition now happens lazily in Render(),
   after acquire, from UNDEFINED; subsequent frames transition from
   PRESENT_SRC_KHR. A per-image `imageInitialised` flag (reset in
   CreateSwapchain) selects the correct oldLayout.

Verified under sway (headless, GPU renderer) + VK_LAYER_KHRONOS_validation:
the original code reproduces both errors on HelloUI; the fixed build emits
zero validation messages across initial render and swapchain recreation.

Resolves #5

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:59:10 +00:00
catbot
4e42d663a6 WebGPU RT: wavefront tracer core (GENERATE/PREP/TRACE/SHADE/RESOLVE)
Replace the megakernel @compute entry with five wavefront kernels sharing
one module, connected by GPU ray/hit/payload buffers and a GPU-driven
indirect bounce loop:

  GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE

- TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only).
- PREP publishes dispatchWorkgroupsIndirect args from the live ray count;
  the indirect-args buffer lives in its own bind group so it is never
  bound read-write in the same dispatch that consumes it as INDIRECT.
- New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate,
  plus an optional user Resolve stage (tonemap hook; identity by default).
- Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary
  between passes within one submit).
- Payload-typed wfPayload binding emitted in the codegen region after the
  user's struct Payload; payload travels with each ray (2*W*H slots).
- Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups
  PerDimension so the W*H-sized work buffers fit past the 128MB baseline.

VulkanTriangle ported to the new API and renders bit-identical to the
megakernel baseline at maxDepth=1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 16:24:41 +00:00
8347467e1e webgpu improvements 2026-05-24 13:32:08 +02:00
850ef7bfb3 clipboard 2026-05-19 00:45:22 +02:00
b5d0f52da0 webgpu sponza 2026-05-19 00:27:09 +02:00
5553ded476 webgpu triangle 2026-05-18 18:43:30 +02:00
64116cd980 custom shader webgpu 2026-05-18 05:39:17 +02:00
dedf6b0467 webgpu support 2026-05-18 04:58:52 +02:00
5352ef69a2 browser DOM support 2026-05-18 02:07:48 +02:00
ac2eb7fb0a new input system 2026-05-12 00:24:48 +02:00
b3db40ebec update 2026-05-05 23:49:29 +02:00
825da78f7f descriptor heap leak fix 2026-05-05 00:02:04 +02:00
c054f1e0b3 update 2026-05-03 02:45:38 +02:00
1f5697326c UI rewrite 3rd attempt 2026-05-02 21:08:20 +02:00
c9fd1b1585 animated example 2026-05-02 00:03:24 +02:00
216972e73a new UI system 2026-05-01 23:35:37 +02:00
d840a81448 bugfixes 2026-04-30 23:15:43 +02:00
d29f5609cd fix 2026-04-30 02:05:16 +02:00
c9ebd448f9 update 2026-04-16 23:03:24 +02:00
ef8d623525 text rendering fixes 2026-04-15 19:30:21 +02:00
5ffe1404fc vulkan2d fixes 2026-04-13 18:36:07 +02:00
ea18f32300 vulkan2d fixes 2026-04-11 23:18:41 +02:00
1c1a142f52 rendertargetvulkan 2026-04-11 18:48:00 +02:00
8b12dc39b3 renderingelement2dvulkan load from asset 2026-04-11 13:54:17 +02:00
f4a48b20c6 renderingelement2dvulkan auto buffer size 2026-04-10 22:57:53 +02:00
3fcea6a3d7 writing ui descriptors 2026-04-10 22:26:15 +02:00
9d43f2e44b const sbt 2026-04-10 20:53:17 +02:00
92dfe16dac pipeline destructor 2026-04-10 20:51:16 +02:00
5427867fff descriptor heap static offset method 2026-04-10 20:30:58 +02:00
177f873639 vulkan UI 2026-04-09 00:15:09 +02:00
f8e142fb06 descriptor heap rewrite 2026-04-05 22:53:59 +02:00
22b8af7bfc update 2026-04-02 16:52:10 +02:00
477b7dd087 F16 rendering 2026-04-01 18:43:18 +02:00
c895c266fb vector renderring 2026-03-31 15:22:55 +02:00
80bb04f84a revert 2026-03-24 05:25:53 +01:00
44a2960415 typo 2026-03-24 05:24:46 +01:00
7fdab4f62b rendering update 2026-03-22 21:08:02 +01:00
bc97c13a0b rendertarget multi frame rewrite complete 2026-03-13 01:06:55 +01:00
2b22c16ce7 rendertarget multi frame rewrite 2026-03-12 21:13:53 +01:00
7f46ac13fa rendering improvements 2026-03-12 01:07:46 +01:00
789bb307d5 render target improvements 2026-03-10 22:32:50 +01:00
1bc6293610 imagevulkan destroy 2026-03-10 20:27:46 +01:00
bd1a6313dd mips level fix 2026-03-10 20:14:33 +01:00
c84593952a vulkanimage customizable mip levels 2026-03-10 19:43:11 +01:00
6eb23b88bd renderelement2d texture loading 2026-03-10 19:03:39 +01:00
d661c88ee2 cleaned up renderer 2026-03-09 20:10:19 +01:00