WebGPU RT: enable TLAS spatial sort via bitonic network
Replace the disabled LSD radix sort in lbvhBuildMain with a data-oblivious workgroup bitonic sorting network and enable it. The radix scatter was gated behind `if (false)` because it produced count/distribution-dependent corruption (TODO-lbvh-sort.md) — a memory-ordering bug in the Hillis-Steele scan / parallel scatter that surfaced only for certain Morton distributions (a small object beside a tight cluster), making geometry flicker. A bitonic network's compare-exchange schedule depends only on N_PADDED, never on key values, so it sidesteps that entire class of distribution-dependent races (TODO strategy #5). 105 sub-stages over 2^14 keys, single workgroup of 1024 threads, 8 compare-exchanges/thread/sub-stage, operating in-place on sortA with a storageBarrier between sub-stages. Sentinel keys (0xFFFFFFFF) compare largest and settle at the tail, exactly where Phase 4 expects them. Restores Morton (Z-order) spatial coherence to TLAS BVH leaves, which the many-instance case needs. Removes the now-dead radix histogram/scan workgroup memory and constants. Verified on the Firefox/Dawn WebGPU stack: a GPU unit test diffs the kernel output against a CPU oracle across all three required distributions (all-uniform, all-one-bucket, small-object-next-to-cluster) plus random, reverse, and empty inputs — all match bit-for-bit with a valid index permutation. Sponza renders correctly with the sort live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
162d98cf5b
commit
14091dcdca
2 changed files with 66 additions and 132 deletions
|
|
@ -1,5 +1,20 @@
|
|||
# LBVH parallel radix sort: count-dependent corruption
|
||||
|
||||
> **RESOLVED (strategy #5 — bitonic sort).** The LSD radix scatter was
|
||||
> replaced with a data-oblivious workgroup **bitonic sorting network** in
|
||||
> `lbvhBuildMain` (`additional/dom-webgpu.js`, Phase 2). Because a bitonic
|
||||
> network's compare-exchange schedule depends only on N_PADDED — never on
|
||||
> the key distribution — it cannot exhibit the count-dependent corruption
|
||||
> documented below. The sort is now enabled (the old `if (false)` guard is
|
||||
> gone) so TLAS leaves are Morton (Z-order) coherent again.
|
||||
>
|
||||
> Verified on the Firefox/Dawn WebGPU stack with a GPU unit test that diffs
|
||||
> the kernel output against a CPU oracle across all three required
|
||||
> distributions (all-uniform, all-one-bucket, and the "small object next to
|
||||
> a tight cluster" repro) plus random/reverse/empty edge cases — all match
|
||||
> bit-for-bit, with a valid index permutation. Sponza renders correctly with
|
||||
> the sort live. The historical analysis below is retained for context.
|
||||
|
||||
## Summary
|
||||
|
||||
The parallel radix sort in `lbvhBuildMain` (additional/dom-webgpu.js) produces
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue