Commit graph

232 commits

Author SHA1 Message Date
d7b9a41b4f Merge pull request 'fix(webgpu): reshape wavefront TRACE/SHADE to 2-D to survive >4.19M rays' (#12) from claude/issue-11 into master 2026-06-01 13:10:05 +02:00
catbot
1e749818ef fix(webgpu): reshape wavefront TRACE/SHADE to 2-D to survive >4.19M rays
A 1-D indirect dispatch of ceil(W*H/64) workgroups for the wavefront
TRACE/SHADE stages overflows maxComputeWorkgroupsPerDimension (65535 on
Dawn/Firefox) once the surface exceeds ~4.19M rays (~2560x1640). Per the
WebGPU spec such a dispatch is silently dropped — no validation error —
so at 4K the world is never traced and the accumulator stays black while
non-RT passes survive.

_wfPrep now spreads the workgroups across a 2-D grid (x clamped to 65535,
y = ceil(wg/65535)), and the wfTrace/wfShade entry points rebuild the
linear ray index from (global_invocation_id, num_workgroups). The existing
`i >= _wfCurCount()` guard absorbs the grid overshoot. GENERATE/RESOLVE
already use a 2-D tile dispatch and are unchanged.

Verified in Firefox/WebGPU with RTStress at a 3449x1739 surface (5.99M
rays, 93716 workgroups — well over the 65535 cap): renders the full cube
grid where master shows a black screen.

Resolves #11

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 11:09:15 +00:00
afb9e320e1 Merge pull request 'docs(vulkan-rt): native descriptor-heap AS read is an NVIDIA driver fault (#7)' (#10) from claude/issue-7 into master 2026-06-01 00:22:52 +02:00
catbot
464cb66063 docs(vulkan-rt): record native descriptor-heap AS read as a driver fault
Investigated the VK_ERROR_DEVICE_LOST on the native VulkanTriangle (#7).
Verified the engine side is correct and validation-clean: the BLAS/TLAS
build finishes before render (FinishInit waits), the built instance is
well-formed (identity transform, mask=0xFF, correct BLAS ref), and
vkWriteResourceDescriptorsEXT stores the TLAS device address at the
expected heap offset (confirmed by dumping the heap bytes). Khronos
validation 1.4.350 reports zero errors.

The fault is isolated to reading the acceleration structure through
VK_EXT_descriptor_heap:
- images/buffers via the same heap render fine (trace disabled -> the
  raygen imageStore path renders a full gradient);
- both traceRayEXT and inline rayQueryEXT (no SBT) fault identically on
  the AS read;
- reproduces with the AS descriptor at heap byte 0 / shader index 0 (no
  offset/stride ambiguity) and regardless of pAddressRange size.

NVIDIA 610.43.02 is the only descriptor_heap implementation available
(llvmpipe lacks the extension), so there is no second implementation to
cross-check. Conclusion: driver-side fault in NVIDIA's brand-new
VK_EXT_descriptor_heap acceleration-structure path; should be reported to
NVIDIA. The traceRayEXT call is left active so the example stays a
faithful reproducer. Documented in both READMEs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 22:21:57 +00:00
6470c12db5 Merge pull request 'fix(webgpu): request adapter's storage-buffer limit, not hardcoded 16' (#9) from claude/issue-8 into master 2026-05-31 23:58:19 +02:00
catbot
23780d83a8 fix(webgpu): request adapter's storage-buffer limit, not hardcoded 16
dom-webgpu.js capped maxStorageBuffersPerShaderStage at 16 even when the
adapter reports far more (64 in our test env). The wavefront SHADE kernel
already binds ~16 storage buffers before any user binding, so any RT
pipeline declaring 2+ user storage buffers at @group(3) overflowed the
limit and failed to build with "Too many bindings of type StorageBuffers".

Request the adapter's reported maxStorageBuffersPerShaderStage /
maxStorageBuffersInPipelineLayout instead of a fixed 16. `clamp` already
mins against the adapter cap, so baseline-only devices still get a valid
request, and the `|| 16` fallback + the `typeof cap === "number"` guard
handle limit names a browser doesn't expose (Firefox returns null for
maxStorageBuffersInPipelineLayout).

Verified in-browser: a 17-storage-buffer compute pipeline fails with the
exact reported error on a device clamped to 16, and builds cleanly on a
device requesting the adapter's 64. RTStress renders correctly.

Resolves #8

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 21:55:42 +00:00
26a41ac528 Merge pull request 'fix(vulkan): clear startup validation errors on native triangle' (#6) from claude/issue-5 into master 2026-05-31 22:59:47 +02:00
catbot
cac433ee09 fix(vulkan): clear startup validation errors on native triangle
Two Vulkan validation errors fired on startup of every native (Vulkan)
example, reported in #5:

1. vkCreateDevice enabledLayerCount != 0. Device layers are deprecated
   and ignored since Vulkan 1.0; passing them is a spec violation
   (VUID-VkDeviceCreateInfo-enabledLayerCount-12384). The device-layer
   enumeration/match block in Device::Initialize is removed and
   enabledLayerCount is pinned to 0 — layers are enabled at the instance
   only.

2. vkQueueSubmit layout transition on a presentable image that "has not
   been acquired". StartInit() and RecreateSwapchainAndImages() eagerly
   transitioned every swapchain image UNDEFINED -> PRESENT_SRC_KHR before
   any vkAcquireNextImageKHR, which the spec forbids (a presentable image
   may only be touched after acquire). Those pre-transitions are removed.
   Each image's first layout transition now happens lazily in Render(),
   after acquire, from UNDEFINED; subsequent frames transition from
   PRESENT_SRC_KHR. A per-image `imageInitialised` flag (reset in
   CreateSwapchain) selects the correct oldLayout.

Verified under sway (headless, GPU renderer) + VK_LAYER_KHRONOS_validation:
the original code reproduces both errors on HelloUI; the fixed build emits
zero validation messages across initial render and swapchain recreation.

Resolves #5

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:59:10 +00:00
6a54c3c4ca Merge pull request 'WebGPU RT: wavefront/streaming tracer (replaces megakernel)' (#4) from claude/issue-3 into master 2026-05-31 22:31:35 +02:00
catbot
358084185a docs: wavefront RT in README + design-doc status; add RTStress to examples 2026-05-31 20:29:12 +00:00
catbot
afc0292fab WebGPU RT: dynamic TLAS sweep-tree depth (next_pow2 instances)
The LBVH bitonic sort still runs over the full 16384 (sentinels sink to
the tail), but the sweep tree is now built and traced at depth
log2(next_pow2(nReal)) instead of a fixed 14. Add nPadded to LbvhPC; leaf
init + bottom-up refit use it; the host passes the same next_pow2 to the
trace via WfParams.tlasNPadded. Renders correctly at 512 instances
(depth 9). The fragile sort phases are untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:28:12 +00:00
catbot
82e5e867d4 WebGPU RT: remove dead megakernel WGSL (no dual path)
The RT pipeline now only builds the wavefront kernels, so the old
single-megakernel traversal/traceRay block (rtWgslMegakernelHelpers) and
the unused rtWgslPrelude alias are dead. Remove them. The rayQuery compute
path keeps rtWgslMegakernelBindings (its own _rq* traversal uses it).
RTStress still renders correctly with the trimmed prelude.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:24:04 +00:00
catbot
dd4122f2ba WebGPU RT: ordered (nearest-child-first) traversal
Add _rtAabbT (AABB test returning entry-t); in both _rtwTraverseBlas and
_rtwTraverseTlas descend the nearer child first and push the farther only
when it hits, re-culling it against the (tightened) bestT when popped.
Render is identical (same closest hit) on VulkanTriangle, RTStress
(512/4096), and Sponza; cuts node visits on dense scenes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:21:44 +00:00
catbot
376e66aeed WebGPU RT: port Sponza to wavefront (shadow ray in SHADE)
Restructure Sponza for the wavefront model: raygen emits the primary ray;
closesthit (in SHADE) gathers albedo/normal, accumulates ambient, and
emits a shadow ray carrying the pending direct term; miss adds the sky
(primary) or the direct term (shadow miss). resolve.wgsl applies the same
Reinhard+gamma the megakernel raygen did inline. User bindings moved to
group 3 (groups 0..2 reserved). RTPass maxDepth=2.

Renders the atrium correctly through the wavefront pipeline (textures,
two-sided shading, sun+ambient, shadows, tonemap).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:16:04 +00:00
catbot
1d2e12dbc9 WebGPU RT: GPU timestamp-query per-pass harness
Request the timestamp-query feature; write begin/end timestamps around
each wavefront pass via timestampWrites; resolve + read back (deferred to
after submit) and print a per-pass us breakdown ~1x/sec. RTStress @ 512
instances, 1920x995: TRACE dominates, total ~1.8-3.0ms/frame.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 20:08:39 +00:00
catbot
f4d6493d91 wip: uncommitted changes from claude run on issue #3 2026-05-31 16:28:38 +00:00
catbot
4e42d663a6 WebGPU RT: wavefront tracer core (GENERATE/PREP/TRACE/SHADE/RESOLVE)
Replace the megakernel @compute entry with five wavefront kernels sharing
one module, connected by GPU ray/hit/payload buffers and a GPU-driven
indirect bounce loop:

  GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE

- TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only).
- PREP publishes dispatchWorkgroupsIndirect args from the live ray count;
  the indirect-args buffer lives in its own bind group so it is never
  bound read-write in the same dispatch that consumes it as INDIRECT.
- New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate,
  plus an optional user Resolve stage (tonemap hook; identity by default).
- Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary
  between passes within one submit).
- Payload-typed wfPayload binding emitted in the codegen region after the
  user's struct Payload; payload travels with each ray (2*W*H slots).
- Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups
  PerDimension so the W*H-sized work buffers fit past the 128MB baseline.

VulkanTriangle ported to the new API and renders bit-identical to the
megakernel baseline at maxDepth=1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 16:24:41 +00:00
e0d72f57f2 Merge pull request 'WebGPU RT: enable TLAS spatial sort via bitonic network (plan phase 3)' (#2) from claude/issue-1 into master 2026-05-31 17:49:38 +02:00
catbot
14091dcdca WebGPU RT: enable TLAS spatial sort via bitonic network
Replace the disabled LSD radix sort in lbvhBuildMain with a data-oblivious
workgroup bitonic sorting network and enable it. The radix scatter was gated
behind `if (false)` because it produced count/distribution-dependent
corruption (TODO-lbvh-sort.md) — a memory-ordering bug in the Hillis-Steele
scan / parallel scatter that surfaced only for certain Morton distributions
(a small object beside a tight cluster), making geometry flicker.

A bitonic network's compare-exchange schedule depends only on N_PADDED, never
on key values, so it sidesteps that entire class of distribution-dependent
races (TODO strategy #5). 105 sub-stages over 2^14 keys, single workgroup of
1024 threads, 8 compare-exchanges/thread/sub-stage, operating in-place on
sortA with a storageBarrier between sub-stages. Sentinel keys (0xFFFFFFFF)
compare largest and settle at the tail, exactly where Phase 4 expects them.
Restores Morton (Z-order) spatial coherence to TLAS BVH leaves, which the
many-instance case needs. Removes the now-dead radix histogram/scan workgroup
memory and constants.

Verified on the Firefox/Dawn WebGPU stack: a GPU unit test diffs the kernel
output against a CPU oracle across all three required distributions
(all-uniform, all-one-bucket, small-object-next-to-cluster) plus random,
reverse, and empty inputs — all match bit-for-bit with a valid index
permutation. Sponza renders correctly with the sort live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 15:48:29 +00:00
162d98cf5b got rid of --local 2026-05-27 04:38:30 +02:00
909a9b46d2 wasm fixes 2026-05-26 22:50:49 +02:00
8347467e1e webgpu improvements 2026-05-24 13:32:08 +02:00
5a75571ffd readme update 2026-05-19 01:43:46 +02:00
850ef7bfb3 clipboard 2026-05-19 00:45:22 +02:00
b5d0f52da0 webgpu sponza 2026-05-19 00:27:09 +02:00
5553ded476 webgpu triangle 2026-05-18 18:43:30 +02:00
64116cd980 custom shader webgpu 2026-05-18 05:39:17 +02:00
dedf6b0467 webgpu support 2026-05-18 04:58:52 +02:00
5352ef69a2 browser DOM support 2026-05-18 02:07:48 +02:00
3859c43ce3 compression example 2026-05-12 00:27:55 +02:00
ac2eb7fb0a new input system 2026-05-12 00:24:48 +02:00
b3db40ebec update 2026-05-05 23:49:29 +02:00
825da78f7f descriptor heap leak fix 2026-05-05 00:02:04 +02:00
c054f1e0b3 update 2026-05-03 02:45:38 +02:00
1f5697326c UI rewrite 3rd attempt 2026-05-02 21:08:20 +02:00
c9fd1b1585 animated example 2026-05-02 00:03:24 +02:00
216972e73a new UI system 2026-05-01 23:35:37 +02:00
d840a81448 bugfixes 2026-04-30 23:15:43 +02:00
d29f5609cd fix 2026-04-30 02:05:16 +02:00
7f5297ca57 add Rendertarget and Shm implementations to project.cpp
Both were on disk but missing from the V2 port's implementations list;
Rendertarget is required for RendertargetVulkan linkage, Shm is the
Wayland shared-memory helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 02:00:17 +02:00
bc669b5e05 crafter-build V2 2026-04-30 01:30:08 +02:00
8a2fd33efc crafter-build V2 2026-04-30 01:29:17 +02:00
c9ebd448f9 update 2026-04-16 23:03:24 +02:00
ef8d623525 text rendering fixes 2026-04-15 19:30:21 +02:00
5ffe1404fc vulkan2d fixes 2026-04-13 18:36:07 +02:00
4c93c5535e typo 2026-04-11 23:22:52 +02:00
ea18f32300 vulkan2d fixes 2026-04-11 23:18:41 +02:00
1c1a142f52 rendertargetvulkan 2026-04-11 18:48:00 +02:00
8b12dc39b3 renderingelement2dvulkan load from asset 2026-04-11 13:54:17 +02:00
f4a48b20c6 renderingelement2dvulkan auto buffer size 2026-04-10 22:57:53 +02:00