Catcrafts/Crafter.Graphics

Author	SHA1	Message	Date
catbot	1628e1a58c	feat(webgpu-rt): wire any-hit + AABB intersection into wavefront traversal The TRACE-stage BLAS descent now threads the payload through, runs the any-hit shader for non-opaque candidates (DXR/VK opacity resolution: ray FORCE flags > instance FORCE flags > geometry opaque bit), and handles AABB leaves via the intersection shader. MeshRecord grows to 64 bytes with geomType + opaque. When any-hit/intersection are present the TRACE pipeline takes the user bind-group layout so those shaders can sample @group(3+) resources; otherwise TRACE keeps its zero-user-code path unchanged. rayQuery stays triangle-only (skips AABB leaves). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 22:09:25 +00:00
catbot	a91603c70b	feat(webgpu-rt): emit intersection/any-hit dispatch + build AABB BVH PipelineRTWebGPU emits a runIntersection mega-switch and the RT_HAS_ANYHIT / RT_HAS_INTERSECTION consts (+ the @CRAFTER_RT_TRACE_USER marker) that gate the library's new TRACE-stage user callbacks, so an opaque triangle-only scene still const-folds them away. Mesh-WebGPU builds a SAH BVH2 over AABB primitives and uploads them in primitive order for the intersection shader to fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 22:09:20 +00:00
catbot	321fe596a7	feat(webgpu-rt): add intersection stage, procedural hit group, AABB BLAS API Extends the cross-backend RT type surface for procedural geometry + any-hit on the WebGPU path: - RTShaderGroupType::ProceduralHitGroup + RTShaderGroup::intersectionShader (mirror VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR). - WebGPURTStage::Intersection for AABB intersection shaders. - Mesh::BuildProcedural(span<RTAabb>, opaque) — the WebGPU analog of a VK_GEOMETRY_TYPE_AABBS_KHR geometry. - wgpuRegisterMeshBLAS gains geomType / opaqueFlag / primCount. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 22:09:14 +00:00
catbot	d7b9a41b4f	Merge pull request 'fix(webgpu): reshape wavefront TRACE/SHADE to 2-D to survive >4.19M rays' (#12 ) from claude/issue-11 into master	2026-06-01 13:10:05 +02:00
catbot	1e749818ef	fix(webgpu): reshape wavefront TRACE/SHADE to 2-D to survive >4.19M rays A 1-D indirect dispatch of ceil(W*H/64) workgroups for the wavefront TRACE/SHADE stages overflows maxComputeWorkgroupsPerDimension (65535 on Dawn/Firefox) once the surface exceeds ~4.19M rays (~2560x1640). Per the WebGPU spec such a dispatch is silently dropped — no validation error — so at 4K the world is never traced and the accumulator stays black while non-RT passes survive. _wfPrep now spreads the workgroups across a 2-D grid (x clamped to 65535, y = ceil(wg/65535)), and the wfTrace/wfShade entry points rebuild the linear ray index from (global_invocation_id, num_workgroups). The existing `i >= _wfCurCount()` guard absorbs the grid overshoot. GENERATE/RESOLVE already use a 2-D tile dispatch and are unchanged. Verified in Firefox/WebGPU with RTStress at a 3449x1739 surface (5.99M rays, 93716 workgroups — well over the 65535 cap): renders the full cube grid where master shows a black screen. Resolves #11 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 11:09:15 +00:00
catbot	afb9e320e1	Merge pull request 'docs(vulkan-rt): native descriptor-heap AS read is an NVIDIA driver fault (#7 )' (#10 ) from claude/issue-7 into master	2026-06-01 00:22:52 +02:00
catbot	464cb66063	docs(vulkan-rt): record native descriptor-heap AS read as a driver fault Investigated the VK_ERROR_DEVICE_LOST on the native VulkanTriangle (#7). Verified the engine side is correct and validation-clean: the BLAS/TLAS build finishes before render (FinishInit waits), the built instance is well-formed (identity transform, mask=0xFF, correct BLAS ref), and vkWriteResourceDescriptorsEXT stores the TLAS device address at the expected heap offset (confirmed by dumping the heap bytes). Khronos validation 1.4.350 reports zero errors. The fault is isolated to reading the acceleration structure through VK_EXT_descriptor_heap: - images/buffers via the same heap render fine (trace disabled -> the raygen imageStore path renders a full gradient); - both traceRayEXT and inline rayQueryEXT (no SBT) fault identically on the AS read; - reproduces with the AS descriptor at heap byte 0 / shader index 0 (no offset/stride ambiguity) and regardless of pAddressRange size. NVIDIA 610.43.02 is the only descriptor_heap implementation available (llvmpipe lacks the extension), so there is no second implementation to cross-check. Conclusion: driver-side fault in NVIDIA's brand-new VK_EXT_descriptor_heap acceleration-structure path; should be reported to NVIDIA. The traceRayEXT call is left active so the example stays a faithful reproducer. Documented in both READMEs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 22:21:57 +00:00
catbot	6470c12db5	Merge pull request 'fix(webgpu): request adapter's storage-buffer limit, not hardcoded 16' (#9 ) from claude/issue-8 into master	2026-05-31 23:58:19 +02:00
catbot	23780d83a8	fix(webgpu): request adapter's storage-buffer limit, not hardcoded 16 dom-webgpu.js capped maxStorageBuffersPerShaderStage at 16 even when the adapter reports far more (64 in our test env). The wavefront SHADE kernel already binds ~16 storage buffers before any user binding, so any RT pipeline declaring 2+ user storage buffers at @group(3) overflowed the limit and failed to build with "Too many bindings of type StorageBuffers". Request the adapter's reported maxStorageBuffersPerShaderStage / maxStorageBuffersInPipelineLayout instead of a fixed 16. `clamp` already mins against the adapter cap, so baseline-only devices still get a valid request, and the `\|\| 16` fallback + the `typeof cap === "number"` guard handle limit names a browser doesn't expose (Firefox returns null for maxStorageBuffersInPipelineLayout). Verified in-browser: a 17-storage-buffer compute pipeline fails with the exact reported error on a device clamped to 16, and builds cleanly on a device requesting the adapter's 64. RTStress renders correctly. Resolves #8 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 21:55:42 +00:00
catbot	26a41ac528	Merge pull request 'fix(vulkan): clear startup validation errors on native triangle' (#6 ) from claude/issue-5 into master	2026-05-31 22:59:47 +02:00
catbot	cac433ee09	fix(vulkan): clear startup validation errors on native triangle Two Vulkan validation errors fired on startup of every native (Vulkan) example, reported in #5: 1. vkCreateDevice enabledLayerCount != 0. Device layers are deprecated and ignored since Vulkan 1.0; passing them is a spec violation (VUID-VkDeviceCreateInfo-enabledLayerCount-12384). The device-layer enumeration/match block in Device::Initialize is removed and enabledLayerCount is pinned to 0 — layers are enabled at the instance only. 2. vkQueueSubmit layout transition on a presentable image that "has not been acquired". StartInit() and RecreateSwapchainAndImages() eagerly transitioned every swapchain image UNDEFINED -> PRESENT_SRC_KHR before any vkAcquireNextImageKHR, which the spec forbids (a presentable image may only be touched after acquire). Those pre-transitions are removed. Each image's first layout transition now happens lazily in Render(), after acquire, from UNDEFINED; subsequent frames transition from PRESENT_SRC_KHR. A per-image `imageInitialised` flag (reset in CreateSwapchain) selects the correct oldLayout. Verified under sway (headless, GPU renderer) + VK_LAYER_KHRONOS_validation: the original code reproduces both errors on HelloUI; the fixed build emits zero validation messages across initial render and swapchain recreation. Resolves #5 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:59:10 +00:00
catbot	6a54c3c4ca	Merge pull request 'WebGPU RT: wavefront/streaming tracer (replaces megakernel)' (#4 ) from claude/issue-3 into master	2026-05-31 22:31:35 +02:00
catbot	358084185a	docs: wavefront RT in README + design-doc status; add RTStress to examples	2026-05-31 20:29:12 +00:00
catbot	afc0292fab	WebGPU RT: dynamic TLAS sweep-tree depth (next_pow2 instances) The LBVH bitonic sort still runs over the full 16384 (sentinels sink to the tail), but the sweep tree is now built and traced at depth log2(next_pow2(nReal)) instead of a fixed 14. Add nPadded to LbvhPC; leaf init + bottom-up refit use it; the host passes the same next_pow2 to the trace via WfParams.tlasNPadded. Renders correctly at 512 instances (depth 9). The fragile sort phases are untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:28:12 +00:00
catbot	82e5e867d4	WebGPU RT: remove dead megakernel WGSL (no dual path) The RT pipeline now only builds the wavefront kernels, so the old single-megakernel traversal/traceRay block (rtWgslMegakernelHelpers) and the unused rtWgslPrelude alias are dead. Remove them. The rayQuery compute path keeps rtWgslMegakernelBindings (its own _rq* traversal uses it). RTStress still renders correctly with the trimmed prelude. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:24:04 +00:00
catbot	dd4122f2ba	WebGPU RT: ordered (nearest-child-first) traversal Add _rtAabbT (AABB test returning entry-t); in both _rtwTraverseBlas and _rtwTraverseTlas descend the nearer child first and push the farther only when it hits, re-culling it against the (tightened) bestT when popped. Render is identical (same closest hit) on VulkanTriangle, RTStress (512/4096), and Sponza; cuts node visits on dense scenes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:21:44 +00:00
catbot	376e66aeed	WebGPU RT: port Sponza to wavefront (shadow ray in SHADE) Restructure Sponza for the wavefront model: raygen emits the primary ray; closesthit (in SHADE) gathers albedo/normal, accumulates ambient, and emits a shadow ray carrying the pending direct term; miss adds the sky (primary) or the direct term (shadow miss). resolve.wgsl applies the same Reinhard+gamma the megakernel raygen did inline. User bindings moved to group 3 (groups 0..2 reserved). RTPass maxDepth=2. Renders the atrium correctly through the wavefront pipeline (textures, two-sided shading, sun+ambient, shadows, tonemap). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:16:04 +00:00
catbot	1d2e12dbc9	WebGPU RT: GPU timestamp-query per-pass harness Request the timestamp-query feature; write begin/end timestamps around each wavefront pass via timestampWrites; resolve + read back (deferred to after submit) and print a per-pass us breakdown ~1x/sec. RTStress @ 512 instances, 1920x995: TRACE dominates, total ~1.8-3.0ms/frame. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:08:39 +00:00
catbot	f4d6493d91	wip: uncommitted changes from claude run on issue #3	2026-05-31 16:28:38 +00:00
catbot	4e42d663a6	WebGPU RT: wavefront tracer core (GENERATE/PREP/TRACE/SHADE/RESOLVE) Replace the megakernel @compute entry with five wavefront kernels sharing one module, connected by GPU ray/hit/payload buffers and a GPU-driven indirect bounce loop: GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE - TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only). - PREP publishes dispatchWorkgroupsIndirect args from the live ray count; the indirect-args buffer lives in its own bind group so it is never bound read-write in the same dispatch that consumes it as INDIRECT. - New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate, plus an optional user Resolve stage (tonemap hook; identity by default). - Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary between passes within one submit). - Payload-typed wfPayload binding emitted in the codegen region after the user's struct Payload; payload travels with each ray (2WH slots). - Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups PerDimension so the W*H-sized work buffers fit past the 128MB baseline. VulkanTriangle ported to the new API and renders bit-identical to the megakernel baseline at maxDepth=1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 16:24:41 +00:00
catbot	e0d72f57f2	Merge pull request 'WebGPU RT: enable TLAS spatial sort via bitonic network (plan phase 3)' (#2 ) from claude/issue-1 into master	2026-05-31 17:49:38 +02:00
catbot	14091dcdca	WebGPU RT: enable TLAS spatial sort via bitonic network Replace the disabled LSD radix sort in lbvhBuildMain with a data-oblivious workgroup bitonic sorting network and enable it. The radix scatter was gated behind `if (false)` because it produced count/distribution-dependent corruption (TODO-lbvh-sort.md) — a memory-ordering bug in the Hillis-Steele scan / parallel scatter that surfaced only for certain Morton distributions (a small object beside a tight cluster), making geometry flicker. A bitonic network's compare-exchange schedule depends only on N_PADDED, never on key values, so it sidesteps that entire class of distribution-dependent races (TODO strategy #5). 105 sub-stages over 2^14 keys, single workgroup of 1024 threads, 8 compare-exchanges/thread/sub-stage, operating in-place on sortA with a storageBarrier between sub-stages. Sentinel keys (0xFFFFFFFF) compare largest and settle at the tail, exactly where Phase 4 expects them. Restores Morton (Z-order) spatial coherence to TLAS BVH leaves, which the many-instance case needs. Removes the now-dead radix histogram/scan workgroup memory and constants. Verified on the Firefox/Dawn WebGPU stack: a GPU unit test diffs the kernel output against a CPU oracle across all three required distributions (all-uniform, all-one-bucket, small-object-next-to-cluster) plus random, reverse, and empty inputs — all match bit-for-bit with a valid index permutation. Sponza renders correctly with the sort live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 15:48:29 +00:00
Jorijn van der Graaf	162d98cf5b	got rid of --local	2026-05-27 04:38:30 +02:00
Jorijn van der Graaf	909a9b46d2	wasm fixes	2026-05-26 22:50:49 +02:00
Jorijn van der Graaf	8347467e1e	webgpu improvements	2026-05-24 13:32:08 +02:00
Jorijn van der Graaf	5a75571ffd	readme update	2026-05-19 01:43:46 +02:00
Jorijn van der Graaf	850ef7bfb3	clipboard	2026-05-19 00:45:22 +02:00
Jorijn van der Graaf	b5d0f52da0	webgpu sponza	2026-05-19 00:27:09 +02:00
Jorijn van der Graaf	5553ded476	webgpu triangle	2026-05-18 18:43:30 +02:00
Jorijn van der Graaf	64116cd980	custom shader webgpu	2026-05-18 05:39:17 +02:00
Jorijn van der Graaf	dedf6b0467	webgpu support	2026-05-18 04:58:52 +02:00
Jorijn van der Graaf	5352ef69a2	browser DOM support	2026-05-18 02:07:48 +02:00
Jorijn van der Graaf	3859c43ce3	compression example	2026-05-12 00:27:55 +02:00
Jorijn van der Graaf	ac2eb7fb0a	new input system	2026-05-12 00:24:48 +02:00
Jorijn van der Graaf	b3db40ebec	update	2026-05-05 23:49:29 +02:00
Jorijn van der Graaf	825da78f7f	descriptor heap leak fix	2026-05-05 00:02:04 +02:00
Jorijn van der Graaf	c054f1e0b3	update	2026-05-03 02:45:38 +02:00
Jorijn van der Graaf	1f5697326c	UI rewrite 3rd attempt	2026-05-02 21:08:20 +02:00
Jorijn van der Graaf	c9fd1b1585	animated example	2026-05-02 00:03:24 +02:00
Jorijn van der Graaf	216972e73a	new UI system	2026-05-01 23:35:37 +02:00
Jorijn van der Graaf	d840a81448	bugfixes	2026-04-30 23:15:43 +02:00
Jorijn van der Graaf	d29f5609cd	fix	2026-04-30 02:05:16 +02:00
Jorijn van der Graaf	7f5297ca57	add Rendertarget and Shm implementations to project.cpp Both were on disk but missing from the V2 port's implementations list; Rendertarget is required for RendertargetVulkan linkage, Shm is the Wayland shared-memory helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 02:00:17 +02:00
Jorijn van der Graaf	bc669b5e05	crafter-build V2	2026-04-30 01:30:08 +02:00
Jorijn van der Graaf	8a2fd33efc	crafter-build V2	2026-04-30 01:29:17 +02:00
Jorijn van der Graaf	c9ebd448f9	update	2026-04-16 23:03:24 +02:00
Jorijn van der Graaf	ef8d623525	text rendering fixes	2026-04-15 19:30:21 +02:00
Jorijn van der Graaf	5ffe1404fc	vulkan2d fixes	2026-04-13 18:36:07 +02:00
Jorijn van der Graaf	4c93c5535e	typo	2026-04-11 23:22:52 +02:00
Jorijn van der Graaf	ea18f32300	vulkan2d fixes	2026-04-11 23:18:41 +02:00

1 2 3 4 5

235 commits