Catcrafts/Crafter.Graphics

Author	SHA1	Message	Date
catbot	8f6a52a460	fix(webgpu-rt): derive rayQuery TLAS leaf-start from dynamic nPadded (#25 ) The software rayQuery shim's _rqTraverseTlas detected BVH leaves with a compile-time constant TLAS_BVH_LEAVES_START = 16384 - 1, while the actual TLAS sweep tree is built at depth log2(next_pow2(instanceCount)). For any scene with fewer than 8193 instances the padded leaf count is far below 16384, so no node index ever reached 16383: every node looked internal, the descent walked into zeroed out-of-tree AABBs, and the pick reported a permanent miss. This broke every rayQuery=true compute shader (builder picking, splash queries) on the WebGPU backend. Pass the per-build padded leaf count to the shim the same way the megakernel _rtwTraverseTlas reads wfParams.tlasNPadded: a small uniform (RqTlasMeta.nPadded) at @group(1) @binding(10), written each wgpuBuildTLAS from wfNextPow2(instanceCount), and bound by both rayQuery dispatch paths. _rqTraverseTlas now computes leavesStart = nPadded - 1 dynamically. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 13:32:58 +00:00
catbot	f9d23cd1f9	Merge pull request 'docs(vulkan-rt): document dynamic descriptor_heap-index hit-shader fault (#23 )' (#24 ) from claude/issue-23 into master	2026-06-03 22:05:45 +02:00
catbot	d08c7cea11	docs(vulkan-rt): document dynamic descriptor_heap-index hit-shader fault (#23 ) Indexing a `layout(descriptor_heap)` array with a runtime (non-constant) index inside a ray-tracing hit shader device-losts on NVIDIA 610.43.02, for both SSBO and sampled-image descriptors. A constant/spec-constant index is fine, and the same dynamic pattern works in fragment shaders, so it's an RT-stage-specific driver fault — the same family as #7/#15 (descriptor-heap AS reads) and #21/#22 (RT recursion + compute TLAS push). Unlike the AS-read fault, this cannot be worked around transparently: a sampled image has no device-address escape hatch the way an acceleration structure does (OpConvertUToAccelerationStructureKHR), and a buffer-only buffer_reference rewrite would need a whole address-table architecture while still leaving the texture half broken. So the resolution is the documented-limitation path (the precedent set by #7). Records the fault and its isolation in README's Native RT status and in the Sponza example README (the textured-closest-hit example, which already reads its albedo through a spec-constant slot for exactly this reason). Documents the recommended consumer pattern: bind one resource and index within it dynamically (single geometry SSBO / buffer_reference at a spec-constant slot; one texture2DArray indexed by layer) rather than selecting a descriptor dynamically — what the WebGPU path already does. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 20:05:12 +00:00
catbot	5358aee2f6	Merge pull request 'fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21 )' (#22 ) from claude/issue-21 into master	2026-06-03 20:36:22 +02:00
catbot	1c310762a7	fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21 ) Two gaps in the Vulkan RT path that fault the device on the NVIDIA proprietary driver with a non-trivial pipeline (simple VulkanTriangle never hit them): 1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit shader that traces a secondary ray (shadow ray — a very common pattern) recursed past the pipeline limit (UB → device fault). PipelineRTVulkan::Init now takes a maxRecursionDepth parameter (default 1, clamped to the device's maxRayRecursionDepth). 2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader that reads an accelerationStructureEXT from the heap — including compute shaders — to read the TLAS device address from a push constant, but only RTPass pushed that address. A compute shader that ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten push slot → garbage AS handle → VK_ERROR_DEVICE_LOST. WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult {patched, tlasPushOffset} instead of writing the clobber-prone global Device::workaroundTlasPushOffset (removed). VulkanShader stores it; ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and ComputeShader tracks its own offset and pushes the caller-supplied TLAS address in Dispatch (new defaulted tlasAddress parameter), mirroring RTPass::Record. The PushConstantRewrite regression test now asserts Patch's returned patched/offset and adds two ray-querying compute-shader cases, proving the rewrite is stage-agnostic and the per-shader offset is correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 18:35:39 +00:00
catbot	2790bbd576	Merge pull request 'fix(vulkan-rt): merge TLAS push constant into existing block (#18 )' (#20 ) from claude/issue-18 into master	2026-06-03 04:29:00 +02:00
catbot	471f480c5d	test(vulkan-rt): spirv-val coverage for the push-constant rewrite (#18 ) Adds tests/PushConstantRewrite, a host test that compiles representative ray-generation shaders with glslang, runs the real WorkaroundNvidiaAS::Patch over them, and asserts with spirv-val (the same invocation vkCreateShaderModule uses) that the result is valid and contains exactly one push-constant block — covering both the merge path (shaders that already declare a push constant, including mat4/vec3/uint, a lone uint, and an array layout) and the synthesize path, plus a no-op case (push constant but no AS read). It also checks the published TLAS push offset for each layout. The workaround namespace is exported so the test can drive Patch directly; both go away with the rest of the workaround. project.cpp wires the test as an executable that recompiles the module and requires glslang + spirv-val. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 02:28:09 +00:00
catbot	45ecc91424	fix(vulkan-rt): merge TLAS push constant into existing block (#18 ) The NVIDIA descriptor-heap AS-read workaround (#15) rewrote heap acceleration-structure reads into a load of the TLAS device address from a push-constant block. It always synthesized a new push-constant block, so any ray-tracing shader that already declared one ended up with two — which SPIR-V forbids ("at most one push constant block statically used per entry point"), and vkCreateShaderModule's spirv-val check rejected: Entry point id '4' uses more than one PushConstant interface. WorkaroundNvidiaAS::Patch now detects an existing PushConstant variable and, when present, appends a single ulong member (the TLAS address) to that block instead of adding a second one, reading the address through the shader's own push-constant variable. The append offset is the end of the user's block, computed from the members' explicit Offset/ArrayStride/ MatrixStride decorations (correct under both scalar and std140 layout) and rounded up to 8. Shaders with no push constant of their own keep getting a freshly synthesized single-member block at offset 0, exactly as before. That offset is published via Device::workaroundTlasPushOffset and RTPass feeds it to vkCmdPushDataEXT so the address lands where the rewritten load reads it (0 for the synthesized case, preserving prior behaviour). Verified on the affected driver (NVIDIA 610.43.02, RTX 4090): VulkanTriangle ray-traces correctly and validation-clean both with and without a user-declared raygen push constant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 02:28:02 +00:00
catbot	7c00ddd474	Merge pull request 'feat(vulkan): re-enable GPU-Assisted Validation' (#19 ) from claude/issue-17 into master	2026-06-03 04:11:00 +02:00
catbot	e7469133e8	feat(vulkan): re-enable GPU-Assisted Validation The GPU-AV enable list was removed to dodge a crash in SDK 1.4.341, whose GPU-AV null-deref'd on descriptor_heap pipelines (VK_PIPELINE_CREATE_2_DESCRIPTOR_HEAP_BIT_EXT, layout = VK_NULL_HANDLE) in PipelineSubState::GetPipelineLayoutUnion: https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/12103 That was fixed in the next SDK release. The validation layer is now 1.4.350 (> 1.4.341), so restore VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT in the VkValidationFeaturesEXT enable list. Verified by running the HelloUI example (which draws through the descriptor_heap compute pipelines) with the layer active: it renders the full UI for the entire run with GPU-AV reporting "Both GPU Assisted Validation and Normal Core Check Validation are enabled" and no descriptor-heap null-deref or VUID errors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 02:10:23 +00:00
catbot	f24107264d	Merge pull request 'fix(vulkan-rt): work around NVIDIA descriptor-heap AS-read device-loss (#15 )' (#16 ) from claude/issue-15 into master	2026-06-03 04:00:38 +02:00
catbot	950059c86e	fix(vulkan-rt): work around NVIDIA descriptor-heap AS-read device-loss (#15 ) Reading an acceleration structure through VK_EXT_descriptor_heap aborts with VK_ERROR_DEVICE_LOST on NVIDIA 610.43.02 — a brand-new-extension driver fault isolated in #7 (engine setup is correct and validation-clean; images/buffers through the same heap work, and both traceRayEXT and inline rayQuery fault identically on the AS read). An acceleration structure can equally be reached by its device address via OpConvertUToAccelerationStructureKHR, which reads no descriptor and so never touches the faulting heap path. glslang has no GLSL spelling for that conversion, so VulkanShader rewrites the compiled SPIR-V at module-load time: every `OpLoad %accelStruct <heap-ptr>` becomes a load of the TLAS device address from a synthesized push-constant block followed by the convert. RTPass pushes the active frame's TLAS address into that push constant. User GLSL and example code are unchanged; acceleration structures still bind into the heap normally. The workaround is gated on Device::workaroundDescriptorHeapAS (true only on the NVIDIA proprietary driver) and confined to one fenced block in Crafter.Graphics-ShaderVulkan.cppm plus the RTPass push and the shaderInt64 feature toggle — delete those once a fixed NVIDIA driver ships and the heap AS read becomes the direct path again. Verified: VulkanTriangle ray-traces correctly on native NVIDIA (RTX 4090), validation-layer-clean, no device loss. The SPIR-V rewrite was independently validated with spirv-val on both the VulkanTriangle and Sponza raygen modules. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-03 01:59:54 +00:00
catbot	b9f65f5273	Merge pull request 'feat(webgpu-rt): any-hit + AABB (procedural) geometry support' (#14 ) from claude/issue-13 into master	2026-06-03 00:10:17 +02:00
catbot	5dd1086f08	docs(webgpu-rt): add RTVolume example (procedural spheres + any-hit cut-out) A 3x3x3 grid of AABB-geometry spheres rendered through an analytic ray-sphere intersection shader, with an any-hit spherical-checkerboard cut-out so the background shows through. Exercises both features end to end on the WebGPU wavefront tracer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 22:09:30 +00:00
catbot	1628e1a58c	feat(webgpu-rt): wire any-hit + AABB intersection into wavefront traversal The TRACE-stage BLAS descent now threads the payload through, runs the any-hit shader for non-opaque candidates (DXR/VK opacity resolution: ray FORCE flags > instance FORCE flags > geometry opaque bit), and handles AABB leaves via the intersection shader. MeshRecord grows to 64 bytes with geomType + opaque. When any-hit/intersection are present the TRACE pipeline takes the user bind-group layout so those shaders can sample @group(3+) resources; otherwise TRACE keeps its zero-user-code path unchanged. rayQuery stays triangle-only (skips AABB leaves). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 22:09:25 +00:00
catbot	a91603c70b	feat(webgpu-rt): emit intersection/any-hit dispatch + build AABB BVH PipelineRTWebGPU emits a runIntersection mega-switch and the RT_HAS_ANYHIT / RT_HAS_INTERSECTION consts (+ the @CRAFTER_RT_TRACE_USER marker) that gate the library's new TRACE-stage user callbacks, so an opaque triangle-only scene still const-folds them away. Mesh-WebGPU builds a SAH BVH2 over AABB primitives and uploads them in primitive order for the intersection shader to fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 22:09:20 +00:00
catbot	321fe596a7	feat(webgpu-rt): add intersection stage, procedural hit group, AABB BLAS API Extends the cross-backend RT type surface for procedural geometry + any-hit on the WebGPU path: - RTShaderGroupType::ProceduralHitGroup + RTShaderGroup::intersectionShader (mirror VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_KHR). - WebGPURTStage::Intersection for AABB intersection shaders. - Mesh::BuildProcedural(span<RTAabb>, opaque) — the WebGPU analog of a VK_GEOMETRY_TYPE_AABBS_KHR geometry. - wgpuRegisterMeshBLAS gains geomType / opaqueFlag / primCount. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 22:09:14 +00:00
catbot	d7b9a41b4f	Merge pull request 'fix(webgpu): reshape wavefront TRACE/SHADE to 2-D to survive >4.19M rays' (#12 ) from claude/issue-11 into master	2026-06-01 13:10:05 +02:00
catbot	1e749818ef	fix(webgpu): reshape wavefront TRACE/SHADE to 2-D to survive >4.19M rays A 1-D indirect dispatch of ceil(W*H/64) workgroups for the wavefront TRACE/SHADE stages overflows maxComputeWorkgroupsPerDimension (65535 on Dawn/Firefox) once the surface exceeds ~4.19M rays (~2560x1640). Per the WebGPU spec such a dispatch is silently dropped — no validation error — so at 4K the world is never traced and the accumulator stays black while non-RT passes survive. _wfPrep now spreads the workgroups across a 2-D grid (x clamped to 65535, y = ceil(wg/65535)), and the wfTrace/wfShade entry points rebuild the linear ray index from (global_invocation_id, num_workgroups). The existing `i >= _wfCurCount()` guard absorbs the grid overshoot. GENERATE/RESOLVE already use a 2-D tile dispatch and are unchanged. Verified in Firefox/WebGPU with RTStress at a 3449x1739 surface (5.99M rays, 93716 workgroups — well over the 65535 cap): renders the full cube grid where master shows a black screen. Resolves #11 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 11:09:15 +00:00
catbot	afb9e320e1	Merge pull request 'docs(vulkan-rt): native descriptor-heap AS read is an NVIDIA driver fault (#7 )' (#10 ) from claude/issue-7 into master	2026-06-01 00:22:52 +02:00
catbot	464cb66063	docs(vulkan-rt): record native descriptor-heap AS read as a driver fault Investigated the VK_ERROR_DEVICE_LOST on the native VulkanTriangle (#7). Verified the engine side is correct and validation-clean: the BLAS/TLAS build finishes before render (FinishInit waits), the built instance is well-formed (identity transform, mask=0xFF, correct BLAS ref), and vkWriteResourceDescriptorsEXT stores the TLAS device address at the expected heap offset (confirmed by dumping the heap bytes). Khronos validation 1.4.350 reports zero errors. The fault is isolated to reading the acceleration structure through VK_EXT_descriptor_heap: - images/buffers via the same heap render fine (trace disabled -> the raygen imageStore path renders a full gradient); - both traceRayEXT and inline rayQueryEXT (no SBT) fault identically on the AS read; - reproduces with the AS descriptor at heap byte 0 / shader index 0 (no offset/stride ambiguity) and regardless of pAddressRange size. NVIDIA 610.43.02 is the only descriptor_heap implementation available (llvmpipe lacks the extension), so there is no second implementation to cross-check. Conclusion: driver-side fault in NVIDIA's brand-new VK_EXT_descriptor_heap acceleration-structure path; should be reported to NVIDIA. The traceRayEXT call is left active so the example stays a faithful reproducer. Documented in both READMEs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 22:21:57 +00:00
catbot	6470c12db5	Merge pull request 'fix(webgpu): request adapter's storage-buffer limit, not hardcoded 16' (#9 ) from claude/issue-8 into master	2026-05-31 23:58:19 +02:00
catbot	23780d83a8	fix(webgpu): request adapter's storage-buffer limit, not hardcoded 16 dom-webgpu.js capped maxStorageBuffersPerShaderStage at 16 even when the adapter reports far more (64 in our test env). The wavefront SHADE kernel already binds ~16 storage buffers before any user binding, so any RT pipeline declaring 2+ user storage buffers at @group(3) overflowed the limit and failed to build with "Too many bindings of type StorageBuffers". Request the adapter's reported maxStorageBuffersPerShaderStage / maxStorageBuffersInPipelineLayout instead of a fixed 16. `clamp` already mins against the adapter cap, so baseline-only devices still get a valid request, and the `\|\| 16` fallback + the `typeof cap === "number"` guard handle limit names a browser doesn't expose (Firefox returns null for maxStorageBuffersInPipelineLayout). Verified in-browser: a 17-storage-buffer compute pipeline fails with the exact reported error on a device clamped to 16, and builds cleanly on a device requesting the adapter's 64. RTStress renders correctly. Resolves #8 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 21:55:42 +00:00
catbot	26a41ac528	Merge pull request 'fix(vulkan): clear startup validation errors on native triangle' (#6 ) from claude/issue-5 into master	2026-05-31 22:59:47 +02:00
catbot	cac433ee09	fix(vulkan): clear startup validation errors on native triangle Two Vulkan validation errors fired on startup of every native (Vulkan) example, reported in #5: 1. vkCreateDevice enabledLayerCount != 0. Device layers are deprecated and ignored since Vulkan 1.0; passing them is a spec violation (VUID-VkDeviceCreateInfo-enabledLayerCount-12384). The device-layer enumeration/match block in Device::Initialize is removed and enabledLayerCount is pinned to 0 — layers are enabled at the instance only. 2. vkQueueSubmit layout transition on a presentable image that "has not been acquired". StartInit() and RecreateSwapchainAndImages() eagerly transitioned every swapchain image UNDEFINED -> PRESENT_SRC_KHR before any vkAcquireNextImageKHR, which the spec forbids (a presentable image may only be touched after acquire). Those pre-transitions are removed. Each image's first layout transition now happens lazily in Render(), after acquire, from UNDEFINED; subsequent frames transition from PRESENT_SRC_KHR. A per-image `imageInitialised` flag (reset in CreateSwapchain) selects the correct oldLayout. Verified under sway (headless, GPU renderer) + VK_LAYER_KHRONOS_validation: the original code reproduces both errors on HelloUI; the fixed build emits zero validation messages across initial render and swapchain recreation. Resolves #5 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:59:10 +00:00
catbot	6a54c3c4ca	Merge pull request 'WebGPU RT: wavefront/streaming tracer (replaces megakernel)' (#4 ) from claude/issue-3 into master	2026-05-31 22:31:35 +02:00
catbot	358084185a	docs: wavefront RT in README + design-doc status; add RTStress to examples	2026-05-31 20:29:12 +00:00
catbot	afc0292fab	WebGPU RT: dynamic TLAS sweep-tree depth (next_pow2 instances) The LBVH bitonic sort still runs over the full 16384 (sentinels sink to the tail), but the sweep tree is now built and traced at depth log2(next_pow2(nReal)) instead of a fixed 14. Add nPadded to LbvhPC; leaf init + bottom-up refit use it; the host passes the same next_pow2 to the trace via WfParams.tlasNPadded. Renders correctly at 512 instances (depth 9). The fragile sort phases are untouched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:28:12 +00:00
catbot	82e5e867d4	WebGPU RT: remove dead megakernel WGSL (no dual path) The RT pipeline now only builds the wavefront kernels, so the old single-megakernel traversal/traceRay block (rtWgslMegakernelHelpers) and the unused rtWgslPrelude alias are dead. Remove them. The rayQuery compute path keeps rtWgslMegakernelBindings (its own _rq* traversal uses it). RTStress still renders correctly with the trimmed prelude. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:24:04 +00:00
catbot	dd4122f2ba	WebGPU RT: ordered (nearest-child-first) traversal Add _rtAabbT (AABB test returning entry-t); in both _rtwTraverseBlas and _rtwTraverseTlas descend the nearer child first and push the farther only when it hits, re-culling it against the (tightened) bestT when popped. Render is identical (same closest hit) on VulkanTriangle, RTStress (512/4096), and Sponza; cuts node visits on dense scenes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:21:44 +00:00
catbot	376e66aeed	WebGPU RT: port Sponza to wavefront (shadow ray in SHADE) Restructure Sponza for the wavefront model: raygen emits the primary ray; closesthit (in SHADE) gathers albedo/normal, accumulates ambient, and emits a shadow ray carrying the pending direct term; miss adds the sky (primary) or the direct term (shadow miss). resolve.wgsl applies the same Reinhard+gamma the megakernel raygen did inline. User bindings moved to group 3 (groups 0..2 reserved). RTPass maxDepth=2. Renders the atrium correctly through the wavefront pipeline (textures, two-sided shading, sun+ambient, shadows, tonemap). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:16:04 +00:00
catbot	1d2e12dbc9	WebGPU RT: GPU timestamp-query per-pass harness Request the timestamp-query feature; write begin/end timestamps around each wavefront pass via timestampWrites; resolve + read back (deferred to after submit) and print a per-pass us breakdown ~1x/sec. RTStress @ 512 instances, 1920x995: TRACE dominates, total ~1.8-3.0ms/frame. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 20:08:39 +00:00
catbot	f4d6493d91	wip: uncommitted changes from claude run on issue #3	2026-05-31 16:28:38 +00:00
catbot	4e42d663a6	WebGPU RT: wavefront tracer core (GENERATE/PREP/TRACE/SHADE/RESOLVE) Replace the megakernel @compute entry with five wavefront kernels sharing one module, connected by GPU ray/hit/payload buffers and a GPU-driven indirect bounce loop: GENERATE -> (PREP -> TRACE -> SHADE) x maxDepth -> RESOLVE - TRACE contains zero user code (pure _rtwTraverseTlas/Blas, opaque-only). - PREP publishes dispatchWorkgroupsIndirect args from the live ray count; the indirect-args buffer lives in its own bind group so it is never bound read-write in the same dispatch that consumes it as INDIRECT. - New emit/accumulate API: rtEmitPrimaryRay / rtEmitRay / rtAccumulate, plus an optional user Resolve stage (tonemap hook; identity by default). - Per-pass WfParams via a dynamic-offset uniform ring (curIsA/bounce vary between passes within one submit). - Payload-typed wfPayload binding emitted in the codegen region after the user's struct Payload; payload travels with each ray (2WH slots). - Request maxBufferSize / maxStorageBufferBindingSize / maxComputeWorkgroups PerDimension so the W*H-sized work buffers fit past the 128MB baseline. VulkanTriangle ported to the new API and renders bit-identical to the megakernel baseline at maxDepth=1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 16:24:41 +00:00
catbot	e0d72f57f2	Merge pull request 'WebGPU RT: enable TLAS spatial sort via bitonic network (plan phase 3)' (#2 ) from claude/issue-1 into master	2026-05-31 17:49:38 +02:00
catbot	14091dcdca	WebGPU RT: enable TLAS spatial sort via bitonic network Replace the disabled LSD radix sort in lbvhBuildMain with a data-oblivious workgroup bitonic sorting network and enable it. The radix scatter was gated behind `if (false)` because it produced count/distribution-dependent corruption (TODO-lbvh-sort.md) — a memory-ordering bug in the Hillis-Steele scan / parallel scatter that surfaced only for certain Morton distributions (a small object beside a tight cluster), making geometry flicker. A bitonic network's compare-exchange schedule depends only on N_PADDED, never on key values, so it sidesteps that entire class of distribution-dependent races (TODO strategy #5). 105 sub-stages over 2^14 keys, single workgroup of 1024 threads, 8 compare-exchanges/thread/sub-stage, operating in-place on sortA with a storageBarrier between sub-stages. Sentinel keys (0xFFFFFFFF) compare largest and settle at the tail, exactly where Phase 4 expects them. Restores Morton (Z-order) spatial coherence to TLAS BVH leaves, which the many-instance case needs. Removes the now-dead radix histogram/scan workgroup memory and constants. Verified on the Firefox/Dawn WebGPU stack: a GPU unit test diffs the kernel output against a CPU oracle across all three required distributions (all-uniform, all-one-bucket, small-object-next-to-cluster) plus random, reverse, and empty inputs — all match bit-for-bit with a valid index permutation. Sponza renders correctly with the sort live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 15:48:29 +00:00
Jorijn van der Graaf	162d98cf5b	got rid of --local	2026-05-27 04:38:30 +02:00
Jorijn van der Graaf	909a9b46d2	wasm fixes	2026-05-26 22:50:49 +02:00
Jorijn van der Graaf	8347467e1e	webgpu improvements	2026-05-24 13:32:08 +02:00
Jorijn van der Graaf	5a75571ffd	readme update	2026-05-19 01:43:46 +02:00
Jorijn van der Graaf	850ef7bfb3	clipboard	2026-05-19 00:45:22 +02:00
Jorijn van der Graaf	b5d0f52da0	webgpu sponza	2026-05-19 00:27:09 +02:00
Jorijn van der Graaf	5553ded476	webgpu triangle	2026-05-18 18:43:30 +02:00
Jorijn van der Graaf	64116cd980	custom shader webgpu	2026-05-18 05:39:17 +02:00
Jorijn van der Graaf	dedf6b0467	webgpu support	2026-05-18 04:58:52 +02:00
Jorijn van der Graaf	5352ef69a2	browser DOM support	2026-05-18 02:07:48 +02:00
Jorijn van der Graaf	3859c43ce3	compression example	2026-05-12 00:27:55 +02:00
Jorijn van der Graaf	ac2eb7fb0a	new input system	2026-05-12 00:24:48 +02:00
Jorijn van der Graaf	b3db40ebec	update	2026-05-05 23:49:29 +02:00
Jorijn van der Graaf	825da78f7f	descriptor heap leak fix	2026-05-05 00:02:04 +02:00

1 2 3 4 5

249 commits