WebGPU RT: wavefront/streaming tracer (replaces megakernel) #4
22 changed files with 1331 additions and 395 deletions
17
README.md
17
README.md
|
|
@ -50,9 +50,16 @@ compute pipeline composed from user-supplied WGSL stages).
|
||||||
bridge. Atlas (`r8unorm`, sub-region writes) is a separate path.
|
bridge. Atlas (`r8unorm`, sub-region writes) is a separate path.
|
||||||
- **PipelineRTVulkan / PipelineRTWebGPU / ShaderBindingTableVulkan /
|
- **PipelineRTVulkan / PipelineRTWebGPU / ShaderBindingTableVulkan /
|
||||||
ShaderBindingTableWebGPU / RTPass** — ray-tracing pipelines. Vulkan
|
ShaderBindingTableWebGPU / RTPass** — ray-tracing pipelines. Vulkan
|
||||||
uses native RT pipelines + SBTs; WebGPU composes one compute
|
uses native RT pipelines + SBTs; WebGPU compiles a **wavefront /
|
||||||
pipeline by stitching the traversal library, a generated hit-group
|
streaming** software tracer — five `@compute` kernels
|
||||||
switch, and the user's raygen / closesthit / miss / anyhit WGSL.
|
(`GENERATE → PREP → TRACE → SHADE → RESOLVE`) sharing one module,
|
||||||
|
connected by GPU ray/hit/payload buffers and a GPU-driven indirect
|
||||||
|
bounce loop (`dispatchWorkgroupsIndirect`). TRACE carries zero user
|
||||||
|
code (traversal + intersection only); user raygen calls
|
||||||
|
`rtEmitPrimaryRay`, and closesthit / miss run in SHADE where they
|
||||||
|
`rtEmitRay` continuation/shadow rays and `rtAccumulate` radiance. An
|
||||||
|
optional Resolve shader tonemaps the linear accumulator. See
|
||||||
|
[WAVEFRONT-DESIGN.md](WAVEFRONT-DESIGN.md).
|
||||||
- **ComputeShader / WebGPUComputeShader** — Tier 1 wrapper used by the
|
- **ComputeShader / WebGPUComputeShader** — Tier 1 wrapper used by the
|
||||||
UI system. Vulkan loads a `.spv` and dispatches with
|
UI system. Vulkan loads a `.spv` and dispatches with
|
||||||
`vkCmdPushDataEXT`; WebGPU loads a user-supplied `.wgsl` blob at
|
`vkCmdPushDataEXT`; WebGPU loads a user-supplied `.wgsl` blob at
|
||||||
|
|
@ -145,6 +152,10 @@ See [examples/](examples/). Quick map:
|
||||||
- [VulkanTriangle](examples/VulkanTriangle/) — ray-traced triangle on
|
- [VulkanTriangle](examples/VulkanTriangle/) — ray-traced triangle on
|
||||||
both Vulkan and WebGPU. The smallest test of the bindless + RT path
|
both Vulkan and WebGPU. The smallest test of the bindless + RT path
|
||||||
on each backend.
|
on each backend.
|
||||||
|
- [RTStress](examples/RTStress/) — wavefront RT benchmark: an N×N×N grid
|
||||||
|
of a cube mesh (instance-count knob `kGrid`, 512 → 8000) shaded with
|
||||||
|
primary + shadow rays. Prints a GPU timestamp-query per-pass breakdown
|
||||||
|
each second. WebGPU/DOM only.
|
||||||
- [Sponza](examples/Sponza/) — ray-traced Sponza atrium on both
|
- [Sponza](examples/Sponza/) — ray-traced Sponza atrium on both
|
||||||
backends. Exercises `.cmesh` / `.ctex` decompression (GPU
|
backends. Exercises `.cmesh` / `.ctex` decompression (GPU
|
||||||
`VK_EXT_memory_decompression` on Vulkan, CPU on WebGPU) and a
|
`VK_EXT_memory_decompression` on Vulkan, CPU on WebGPU) and a
|
||||||
|
|
|
||||||
87
WAVEFRONT-DESIGN.md
Normal file
87
WAVEFRONT-DESIGN.md
Normal file
|
|
@ -0,0 +1,87 @@
|
||||||
|
# WebGPU wavefront RT rewrite — design & progress (issue #3)
|
||||||
|
|
||||||
|
Replaces the single megakernel (`main`, 8×8 tile, per-pixel
|
||||||
|
raygen→traceRay→CH/miss→store) with a streaming wavefront tracer:
|
||||||
|
`GENERATE → PREP → (TRACE → SHADE → PREP)×maxDepth → RESOLVE`, each its own
|
||||||
|
compute pass, dispatch sizes driven by `dispatchWorkgroupsIndirect`.
|
||||||
|
|
||||||
|
## Kernels (all generated/assembled the same megakernel way, just split)
|
||||||
|
- **GENERATE** (1 thread/pixel, 8×8): runs user `raygen_main(gid)` which calls
|
||||||
|
`rtEmitPrimaryRay(...)`. Clears accum slot + payload slot for the pixel.
|
||||||
|
- **PREP** (1 thread): reads emit counter for the just-filled ray buffer,
|
||||||
|
writes indirect args `[ceil(n/64),1,1]`, publishes `traceCount=n`, swaps
|
||||||
|
cur/next ray buffer, resets next emit counter. One PREP before first TRACE
|
||||||
|
and one after each SHADE.
|
||||||
|
- **TRACE** (1 thread/ray, 64-wide, indirect): ZERO user code. Reads ray i,
|
||||||
|
runs `_rtTraverseTlas`, writes `HitResult` i (t/instanceId/primId/hg/attribs
|
||||||
|
/objToWorld/customIndex/missFlag).
|
||||||
|
- **SHADE** (1 thread/ray, 64-wide, indirect): reads ray i + hit i + payload
|
||||||
|
slot p. miss→`runMiss`, hit→`runClosestHit` (unless SKIP_CLOSEST_HIT). User
|
||||||
|
code calls `rtAccumulate(pixel,rgb)` and `rtEmitRay(...)`.
|
||||||
|
- **RESOLVE** (1 thread/pixel, 8×8): reads accum slot, runs user `resolve_main`
|
||||||
|
if present else passthrough; writes outImage.
|
||||||
|
|
||||||
|
## Buffers (rtState, sized to 2*W*H rays)
|
||||||
|
- `wfRaysA`,`wfRaysB`: array<WfRay>, ping/pong. WfRay = origin,tMin,dir,tMax,
|
||||||
|
pixel,flags,cullMask,missIndex,sbtOffset,payloadSlot,kind,_pad.
|
||||||
|
- `wfHits`: array<HitResult> (sized = ray capacity).
|
||||||
|
- `wfPayload`: array<Payload> — declared in CODEGEN region after user Payload.
|
||||||
|
- `wfAccum`: array<vec4<f32>> per pixel (W*H).
|
||||||
|
- `wfCounters`: atomic counters: emitA, emitB, trace dispatch args, etc.
|
||||||
|
- `wfIndirect`: INDIRECT dispatch-args buffer.
|
||||||
|
|
||||||
|
## API (new, breaking)
|
||||||
|
- raygen: `rtEmitPrimaryRay(origin,tMin,dir,tMax,flags,cullMask,sbtOff,missIdx)`
|
||||||
|
→ allocates payloadSlot=pixel, writes ray to current buffer (atomic bump).
|
||||||
|
- CH/miss: `rtEmitRay(origin,tMin,dir,tMax,flags,cullMask,sbtOff,missIdx,payload)`
|
||||||
|
spawns into NEXT buffer carrying a payload slot; `rtAccumulate(pixel,rgb)`.
|
||||||
|
- `rtGetPayload(slot)` / payload passed by value into CH/miss via slot.
|
||||||
|
|
||||||
|
## Tonemap / resolve
|
||||||
|
Accum buffer is linear. Optional user `WebGPURTStage::Resolve` entry
|
||||||
|
`resolve_main(coord:vec2<u32>, hdr:vec4<f32>)->vec4<f32>`. None → passthrough.
|
||||||
|
VulkanTriangle: no resolve (exact match). Sponza: resolve does Reinhard+gamma.
|
||||||
|
|
||||||
|
## Indirect dispatch (Phase 2 de-risk)
|
||||||
|
Prove `dispatchWorkgroupsIndirect` + cross-pass atomic visibility with a toy
|
||||||
|
"emit N → dispatch N" before wiring real kernels. WebGPU inserts an implicit
|
||||||
|
barrier between compute passes in one submit, so atomics written in PREP are
|
||||||
|
visible to TRACE.
|
||||||
|
|
||||||
|
## maxDepth
|
||||||
|
Compile/runtime knob. JS unrolls the chain to maxDepth. VulkanTriangle
|
||||||
|
maxDepth=1 (primary only). Sponza maxDepth=2 (primary + shadow).
|
||||||
|
|
||||||
|
## Status / progress
|
||||||
|
- [x] baseline VulkanTriangle renders (megakernel)
|
||||||
|
- [x] wavefront prelude + codegen (5 entry points share one module)
|
||||||
|
- [x] VulkanTriangle on wavefront (maxDepth=1) — bit-identical to baseline
|
||||||
|
- [x] indirect-dispatch bounce loop + PREP (cross-pass atomics proven)
|
||||||
|
- [x] RTStress example (N³ cube grid) + GPU timestamp-query per-pass HUD
|
||||||
|
- [x] Sponza port (shadow ray in SHADE) — renders the atrium correctly
|
||||||
|
- [x] ordered (nearest-child-first) traversal
|
||||||
|
- [x] dynamic TLAS sweep-tree depth (next_pow2 instances)
|
||||||
|
- [x] device limits (maxBufferSize / maxStorageBufferBindingSize /
|
||||||
|
maxComputeWorkgroupsPerDimension) + timestamp-query feature
|
||||||
|
- [x] megakernel dead path removed (RT pipeline builds only wavefront)
|
||||||
|
- [~] binding packing (Phase 7): SKIPPED — target device reports 64 storage
|
||||||
|
buffers/stage (≥12), so the merge is unnecessary (issue makes it
|
||||||
|
conditional on <12).
|
||||||
|
|
||||||
|
### Measured (this container's GPU, via timestamp-query; NOT a 4090)
|
||||||
|
Per-pass GPU time, 1920×995, primary+shadow (maxDepth=2):
|
||||||
|
- RTStress 512 inst: GEN ~0.80ms TRACE ~1.63ms SHADE ~1.00ms total ~3.52ms (~280 fps)
|
||||||
|
- RTStress 4096 inst: GEN ~0.80ms TRACE ~1.95ms SHADE ~1.00ms total ~3.85ms (~260 fps)
|
||||||
|
- Sponza: GEN ~0.79ms TRACE ~1.81ms SHADE ~1.00ms total ~3.69ms
|
||||||
|
8× the instances costs only ~16% more TRACE — the spatial TLAS + ordered
|
||||||
|
descent scale sub-linearly. NOTE: a 4090 number and the TRACE-kernel
|
||||||
|
register/occupancy delta require hardware + a profiler not available in
|
||||||
|
this CI container; the architectural win (TRACE carries zero user code, so
|
||||||
|
its register footprint is the traversal loop alone) is structural.
|
||||||
|
|
||||||
|
## Files
|
||||||
|
- `additional/dom-webgpu.js` — prelude (`rtWgsl*`), `wgpuLoadRTPipeline`,
|
||||||
|
`wgpuDispatchRT`, LBVH build, rtState/buffers, device-limit clamp (~L131).
|
||||||
|
- `implementations/Crafter.Graphics-PipelineRTWebGPU.cpp` — assembles user
|
||||||
|
WGSL + entry glue; must emit 5 entry points + payloadStore binding.
|
||||||
|
- examples/{VulkanTriangle,Sponza,RTStress}/*.wgsl + main.cpp.
|
||||||
File diff suppressed because it is too large
Load diff
54
examples/RTStress/closesthit.wgsl
Normal file
54
examples/RTStress/closesthit.wgsl
Normal file
|
|
@ -0,0 +1,54 @@
|
||||||
|
// RTStress closest-hit (runs in SHADE). Computes flat-shaded Lambert from
|
||||||
|
// the hit triangle's geometric normal, accumulates ambient, and — if the
|
||||||
|
// surface faces the sun — emits a shadow ray toward the sun. The shadow
|
||||||
|
// ray's miss (sun visible) adds the direct term; its hit (occluded) adds
|
||||||
|
// nothing because RT_FLAG_SKIP_CLOSEST_HIT suppresses closesthit on hit.
|
||||||
|
//
|
||||||
|
// Payload declared here so the assembler sees it before wfPayload / SHADE.
|
||||||
|
struct Payload {
|
||||||
|
color: vec3<f32>, // shadow ray: pending direct contribution
|
||||||
|
shadowRay: u32, // 0 primary, 1 shadow
|
||||||
|
};
|
||||||
|
|
||||||
|
const SUN_DIR_TO_LIGHT: vec3<f32> = vec3<f32>(0.40, 0.85, 0.35);
|
||||||
|
const SUN_COLOR: vec3<f32> = vec3<f32>(1.15, 1.05, 0.90);
|
||||||
|
const AMBIENT_COLOR: vec3<f32> = vec3<f32>(0.12, 0.13, 0.18);
|
||||||
|
|
||||||
|
// Cheap per-instance albedo so the grid reads as distinct cubes (and any
|
||||||
|
// TLAS flicker as instance count scales is obvious).
|
||||||
|
fn instanceAlbedo(i: u32) -> vec3<f32> {
|
||||||
|
let h = i * 2654435761u;
|
||||||
|
return vec3<f32>(
|
||||||
|
0.35 + 0.6 * f32((h >> 0u) & 255u) / 255.0,
|
||||||
|
0.35 + 0.6 * f32((h >> 8u) & 255u) / 255.0,
|
||||||
|
0.35 + 0.6 * f32((h >> 16u) & 255u) / 255.0);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn closesthit_main(ray: RayDesc, hit: HitInfo, payload: ptr<function, Payload>) {
|
||||||
|
let meshRec = meshRecords[tlasEntries[hit.instanceId].blasMeshIdx];
|
||||||
|
let verts = _rtFetchTri(meshRec, hit.primitiveId);
|
||||||
|
let nObj = normalize(cross(verts[1] - verts[0], verts[2] - verts[0]));
|
||||||
|
let nWorld = normalize(vec3<f32>(
|
||||||
|
dot(hit.objectToWorldR0.xyz, nObj),
|
||||||
|
dot(hit.objectToWorldR1.xyz, nObj),
|
||||||
|
dot(hit.objectToWorldR2.xyz, nObj)));
|
||||||
|
|
||||||
|
let albedo = instanceAlbedo(hit.customIndex);
|
||||||
|
let worldPos = ray.origin + ray.direction * hit.t;
|
||||||
|
let viewDir = -ray.direction;
|
||||||
|
let nFacing = select(-nWorld, nWorld, dot(nWorld, viewDir) > 0.0);
|
||||||
|
let sunDir = normalize(SUN_DIR_TO_LIGHT);
|
||||||
|
let nDotL = max(0.0, dot(nFacing, sunDir));
|
||||||
|
|
||||||
|
rtAccumulate(albedo * AMBIENT_COLOR);
|
||||||
|
|
||||||
|
if (nDotL > 0.0) {
|
||||||
|
var sp: Payload;
|
||||||
|
sp.color = albedo * SUN_COLOR * nDotL;
|
||||||
|
sp.shadowRay = 1u;
|
||||||
|
let shadowOrigin = worldPos + nFacing * 0.05;
|
||||||
|
rtEmitRay(shadowOrigin, 0.01, sunDir, 100000.0,
|
||||||
|
RT_FLAG_SKIP_CLOSEST_HIT | RT_FLAG_TERMINATE_ON_FIRST_HIT,
|
||||||
|
0xFFu, 0u, 0u, sp);
|
||||||
|
}
|
||||||
|
}
|
||||||
200
examples/RTStress/main.cpp
Normal file
200
examples/RTStress/main.cpp
Normal file
|
|
@ -0,0 +1,200 @@
|
||||||
|
// RTStress — the standing many-instance wavefront RT benchmark. An
|
||||||
|
// N×N×N grid of a small cube mesh (one BLAS, many TLAS instances), shaded
|
||||||
|
// with primary + shadow rays through the wavefront pipeline. The grid edge
|
||||||
|
// `kGrid` is the instance-count knob: 8 → 512, 16 → 4096, 20 → 8000
|
||||||
|
// (LBVH_MAX = 16384). Frame time is printed to the console each second so
|
||||||
|
// fps-vs-instance-count can be read off without external tooling; the JS
|
||||||
|
// bridge additionally prints a GPU timestamp-query per-pass breakdown.
|
||||||
|
//
|
||||||
|
// WebGPU/DOM only — the wavefront tracer is the WebGPU software RT path.
|
||||||
|
|
||||||
|
#ifndef CRAFTER_GRAPHICS_WINDOW_DOM
|
||||||
|
int main() { return 0; } // native path is hardware RT; out of scope here
|
||||||
|
#else
|
||||||
|
|
||||||
|
import Crafter.Graphics;
|
||||||
|
import Crafter.Math;
|
||||||
|
import Crafter.Event;
|
||||||
|
import std;
|
||||||
|
|
||||||
|
using namespace Crafter;
|
||||||
|
namespace fs = std::filesystem;
|
||||||
|
|
||||||
|
namespace {
|
||||||
|
// Instance-count knob. instances = kGrid³. Bump to 16 (4096) or 20
|
||||||
|
// (8000) to stress the TLAS; the LBVH build caps at 16384.
|
||||||
|
constexpr int kGrid = 8;
|
||||||
|
constexpr float kSpacing = 2.5f;
|
||||||
|
constexpr float kHalf = 0.5f; // cube half-extent
|
||||||
|
|
||||||
|
struct CameraGPU {
|
||||||
|
float origin[3]; float pad0;
|
||||||
|
float right[3]; float tanHalf;
|
||||||
|
float up[3]; float aspect;
|
||||||
|
float forward[3]; float pad1;
|
||||||
|
};
|
||||||
|
static_assert(sizeof(CameraGPU) == 64);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main() {
|
||||||
|
const int instanceCount = kGrid * kGrid * kGrid;
|
||||||
|
std::println("[RTStress] grid {}^3 = {} instances", kGrid, instanceCount);
|
||||||
|
|
||||||
|
Device::Initialize();
|
||||||
|
static Window window(1280, 720, "RTStress");
|
||||||
|
auto cmd = window.StartInit();
|
||||||
|
|
||||||
|
DescriptorHeapWebGPU heap;
|
||||||
|
heap.Initialize(/*images*/ 1, /*buffers*/ 2, /*samplers*/ 1);
|
||||||
|
|
||||||
|
std::array<WebGPUShader, 4> shaders {{
|
||||||
|
WebGPUShader(fs::path("raygen.wgsl"), "raygen_main", WebGPURTStage::Raygen),
|
||||||
|
WebGPUShader(fs::path("miss.wgsl"), "miss_main", WebGPURTStage::Miss),
|
||||||
|
WebGPUShader(fs::path("closesthit.wgsl"), "closesthit_main", WebGPURTStage::ClosestHit),
|
||||||
|
WebGPUShader(fs::path("resolve.wgsl"), "resolve_main", WebGPURTStage::Resolve),
|
||||||
|
}};
|
||||||
|
ShaderBindingTableWebGPU sbt;
|
||||||
|
sbt.Init(shaders);
|
||||||
|
|
||||||
|
std::array<RTShaderGroup, 1> raygenGroups {{ { .type = RTShaderGroupType::General, .generalShader = 0 } }};
|
||||||
|
std::array<RTShaderGroup, 1> missGroups {{ { .type = RTShaderGroupType::General, .generalShader = 1 } }};
|
||||||
|
std::array<RTShaderGroup, 1> hitGroups {{ { .type = RTShaderGroupType::TrianglesHitGroup, .closestHitShader = 2 } }};
|
||||||
|
|
||||||
|
// One user binding: the camera storage buffer at @group(3).
|
||||||
|
std::array<UICustomBinding, 1> bindings {{
|
||||||
|
{ .group = 3, .binding = 0, .kind = UICustomBindingKind::Buffer, ._pad = 0, .pushOffset = 0 },
|
||||||
|
}};
|
||||||
|
|
||||||
|
PipelineRTWebGPU pipeline;
|
||||||
|
pipeline.Init(cmd, raygenGroups, missGroups, hitGroups, sbt, bindings);
|
||||||
|
|
||||||
|
// ── Unit cube mesh (8 verts, 12 tris). ────────────────────────────
|
||||||
|
static std::array<Vector<float, 3, 3>, 8> verts {{
|
||||||
|
{-kHalf, -kHalf, -kHalf}, { kHalf, -kHalf, -kHalf},
|
||||||
|
{ kHalf, kHalf, -kHalf}, {-kHalf, kHalf, -kHalf},
|
||||||
|
{-kHalf, -kHalf, kHalf}, { kHalf, -kHalf, kHalf},
|
||||||
|
{ kHalf, kHalf, kHalf}, {-kHalf, kHalf, kHalf},
|
||||||
|
}};
|
||||||
|
static std::array<std::uint32_t, 36> indices {{
|
||||||
|
0,1,2, 0,2,3, 5,4,7, 5,7,6, 4,0,3, 4,3,7,
|
||||||
|
1,5,6, 1,6,2, 4,5,1, 4,1,0, 3,2,6, 3,6,7,
|
||||||
|
}};
|
||||||
|
static Mesh cube;
|
||||||
|
cube.Build(verts, indices, cmd);
|
||||||
|
|
||||||
|
// ── Camera buffer + handle array. ─────────────────────────────────
|
||||||
|
WebGPUBuffer<CameraGPU, true> cameraBuf;
|
||||||
|
cameraBuf.Create(1);
|
||||||
|
static std::array<std::uint32_t, 1> userHandles { cameraBuf.handle };
|
||||||
|
|
||||||
|
// ── Instance grid. Reserve so RenderingElement3D::Add pointers stay
|
||||||
|
// valid across vector growth. ─────────────────────────────────────
|
||||||
|
static std::vector<RenderingElement3D> renderers;
|
||||||
|
renderers.reserve(static_cast<std::size_t>(instanceCount));
|
||||||
|
const float origin0 = -0.5f * static_cast<float>(kGrid - 1) * kSpacing;
|
||||||
|
for (int x = 0; x < kGrid; ++x)
|
||||||
|
for (int y = 0; y < kGrid; ++y)
|
||||||
|
for (int z = 0; z < kGrid; ++z) {
|
||||||
|
renderers.emplace_back();
|
||||||
|
RenderingElement3D& r = renderers.back();
|
||||||
|
auto& tx = r.instance.transform.matrix;
|
||||||
|
tx[0][0] = 1; tx[0][1] = 0; tx[0][2] = 0; tx[0][3] = origin0 + float(x) * kSpacing;
|
||||||
|
tx[1][0] = 0; tx[1][1] = 1; tx[1][2] = 0; tx[1][3] = origin0 + float(y) * kSpacing;
|
||||||
|
tx[2][0] = 0; tx[2][1] = 0; tx[2][2] = 1; tx[2][3] = origin0 + float(z) * kSpacing;
|
||||||
|
r.instance.instanceCustomIndex = static_cast<std::uint32_t>(renderers.size() - 1);
|
||||||
|
r.instance.mask = 0xFF;
|
||||||
|
r.instance.instanceShaderBindingTableRecordOffset = 0;
|
||||||
|
r.instance.flags = kRTGeometryInstanceForceOpaque;
|
||||||
|
r.instance.accelerationStructureReference = cube.blasAddr;
|
||||||
|
RenderingElement3D::Add(&r);
|
||||||
|
}
|
||||||
|
RenderingElement3D::BuildTLAS(cmd, 0);
|
||||||
|
|
||||||
|
window.descriptorHeap = &heap;
|
||||||
|
window.FinishInit();
|
||||||
|
|
||||||
|
RTPass rtPass(&pipeline);
|
||||||
|
rtPass.handlesPtr = userHandles.data();
|
||||||
|
rtPass.handlesCount = static_cast<std::uint32_t>(userHandles.size());
|
||||||
|
rtPass.maxDepth = 2; // primary + shadow
|
||||||
|
window.passes.push_back(&rtPass);
|
||||||
|
|
||||||
|
// ── Free camera framing the grid from a corner. ───────────────────
|
||||||
|
const float ext = float(kGrid - 1) * kSpacing;
|
||||||
|
struct CamState {
|
||||||
|
Vector<float, 3, 4> position;
|
||||||
|
float yaw;
|
||||||
|
float pitch;
|
||||||
|
} cam {
|
||||||
|
Vector<float, 3, 4>{ ext * 1.4f, ext * 1.0f, ext * 1.4f },
|
||||||
|
0.0f, 0.0f,
|
||||||
|
};
|
||||||
|
{
|
||||||
|
// Aim at the grid centre (origin).
|
||||||
|
Vector<float, 3, 4> d { -cam.position.x, -cam.position.y, -cam.position.z };
|
||||||
|
const float len = std::sqrt(d.x*d.x + d.y*d.y + d.z*d.z);
|
||||||
|
cam.yaw = std::atan2(d.z, d.x);
|
||||||
|
cam.pitch = std::asin(d.y / len);
|
||||||
|
}
|
||||||
|
|
||||||
|
Input::Map inputMap;
|
||||||
|
Input::Action& moveAct = inputMap.AddAction("Move", Input::ActionType::Vector2);
|
||||||
|
Input::Action& lookAct = inputMap.AddAction("Look", Input::ActionType::Vector2);
|
||||||
|
moveAct.bindings = { Input::WASDBind{
|
||||||
|
Key(CrafterKeys::W), Key(CrafterKeys::S), Key(CrafterKeys::A), Key(CrafterKeys::D) } };
|
||||||
|
lookAct.bindings = { Input::MouseDeltaBind{ 1.0f } };
|
||||||
|
inputMap.Attach(window);
|
||||||
|
|
||||||
|
const float kMoveSpeed = ext * 0.8f;
|
||||||
|
const float kLookSens = 0.05f;
|
||||||
|
const float kDt = 1.0f / 60.0f;
|
||||||
|
|
||||||
|
static int frames = 0;
|
||||||
|
static double tAccum = 0.0;
|
||||||
|
EventListener<void> camTick(&window.onBeforeUpdate, [&]() {
|
||||||
|
inputMap.Tick();
|
||||||
|
cam.yaw += lookAct.vector2.x * kLookSens;
|
||||||
|
cam.pitch -= lookAct.vector2.y * kLookSens;
|
||||||
|
cam.pitch = std::clamp(cam.pitch, -1.55f, 1.55f);
|
||||||
|
|
||||||
|
const float cp = std::cos(cam.pitch), sp = std::sin(cam.pitch);
|
||||||
|
const float cy = std::cos(cam.yaw), sy = std::sin(cam.yaw);
|
||||||
|
Vector<float, 3, 4> forward { cp * cy, sp, cp * sy };
|
||||||
|
Vector<float, 3, 4> worldUp { 0.0f, 1.0f, 0.0f };
|
||||||
|
Vector<float, 3, 4> right { forward.y*worldUp.z - forward.z*worldUp.y,
|
||||||
|
forward.z*worldUp.x - forward.x*worldUp.z,
|
||||||
|
forward.x*worldUp.y - forward.y*worldUp.x };
|
||||||
|
const float rLen = std::sqrt(right.x*right.x + right.y*right.y + right.z*right.z);
|
||||||
|
right.x /= rLen; right.y /= rLen; right.z /= rLen;
|
||||||
|
Vector<float, 3, 4> up { right.y*forward.z - right.z*forward.y,
|
||||||
|
right.z*forward.x - right.x*forward.z,
|
||||||
|
right.x*forward.y - right.y*forward.x };
|
||||||
|
|
||||||
|
const float dx = moveAct.vector2.x * kMoveSpeed * kDt;
|
||||||
|
const float dy = moveAct.vector2.y * kMoveSpeed * kDt;
|
||||||
|
cam.position.x += right.x*dx + forward.x*dy;
|
||||||
|
cam.position.y += right.y*dx + forward.y*dy;
|
||||||
|
cam.position.z += right.z*dx + forward.z*dy;
|
||||||
|
|
||||||
|
CameraGPU& g = cameraBuf.value[0];
|
||||||
|
g.origin[0]=cam.position.x; g.origin[1]=cam.position.y; g.origin[2]=cam.position.z; g.pad0=0;
|
||||||
|
g.right[0]=right.x; g.right[1]=right.y; g.right[2]=right.z;
|
||||||
|
g.up[0]=up.x; g.up[1]=up.y; g.up[2]=up.z;
|
||||||
|
g.forward[0]=forward.x; g.forward[1]=forward.y; g.forward[2]=forward.z;
|
||||||
|
g.aspect = float(window.width) / float(window.height);
|
||||||
|
g.tanHalf = std::tan(70.0f * 3.14159265f / 360.0f);
|
||||||
|
g.pad1 = 0;
|
||||||
|
cameraBuf.FlushDevice();
|
||||||
|
|
||||||
|
if (++frames >= 60) {
|
||||||
|
std::println("[RTStress] {} instances @ ~{} frames since last report", instanceCount, frames);
|
||||||
|
frames = 0;
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
window.Render();
|
||||||
|
window.StartUpdate();
|
||||||
|
window.StartSync();
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
#endif
|
||||||
11
examples/RTStress/miss.wgsl
Normal file
11
examples/RTStress/miss.wgsl
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
// RTStress miss (runs in SHADE). Primary miss → sky gradient. Shadow miss
|
||||||
|
// → the sun is unoccluded, so add the pending direct contribution.
|
||||||
|
fn miss_main(ray: RayDesc, payload: ptr<function, Payload>) {
|
||||||
|
if ((*payload).shadowRay == 1u) {
|
||||||
|
rtAccumulate((*payload).color);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
let t = clamp(ray.direction.y * 0.5 + 0.5, 0.0, 1.0);
|
||||||
|
rtAccumulate(mix(vec3<f32>(0.50, 0.62, 0.85),
|
||||||
|
vec3<f32>(0.90, 0.94, 1.00), t));
|
||||||
|
}
|
||||||
46
examples/RTStress/project.cpp
Normal file
46
examples/RTStress/project.cpp
Normal file
|
|
@ -0,0 +1,46 @@
|
||||||
|
import std;
|
||||||
|
import Crafter.Build;
|
||||||
|
namespace fs = std::filesystem;
|
||||||
|
using namespace Crafter;
|
||||||
|
|
||||||
|
extern "C" Configuration CrafterBuildProject(std::span<const std::string_view> args) {
|
||||||
|
bool isWasm = false;
|
||||||
|
for (std::string_view a : args) {
|
||||||
|
if (a.starts_with("--target=") && a.find("wasm") != std::string_view::npos) {
|
||||||
|
isWasm = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
std::vector<std::string> graphicsArgs(args.begin(), args.end());
|
||||||
|
Configuration* graphics = LocalProject({
|
||||||
|
.projectFile = "../../project.cpp",
|
||||||
|
.args = graphicsArgs,
|
||||||
|
});
|
||||||
|
|
||||||
|
Configuration cfg;
|
||||||
|
cfg.path = "./";
|
||||||
|
cfg.name = "RTStress";
|
||||||
|
cfg.outputName = "RTStress";
|
||||||
|
cfg.type = ConfigurationType::Executable;
|
||||||
|
if (isWasm) {
|
||||||
|
cfg.target = "wasm32-wasip1";
|
||||||
|
cfg.defines.push_back({"CRAFTER_GRAPHICS_WINDOW_DOM", ""});
|
||||||
|
cfg.compileFlags.push_back("-msimd128");
|
||||||
|
}
|
||||||
|
ApplyStandardArgs(cfg, args);
|
||||||
|
cfg.dependencies = { graphics };
|
||||||
|
|
||||||
|
std::array<fs::path, 0> ifaces = {};
|
||||||
|
std::array<fs::path, 1> impls = { "main" };
|
||||||
|
cfg.GetInterfacesAndImplementations(ifaces, impls);
|
||||||
|
|
||||||
|
if (isWasm) {
|
||||||
|
cfg.files.emplace_back(fs::path("raygen.wgsl"));
|
||||||
|
cfg.files.emplace_back(fs::path("closesthit.wgsl"));
|
||||||
|
cfg.files.emplace_back(fs::path("miss.wgsl"));
|
||||||
|
cfg.files.emplace_back(fs::path("resolve.wgsl"));
|
||||||
|
EnableWasiBrowserRuntime(cfg);
|
||||||
|
}
|
||||||
|
return cfg;
|
||||||
|
}
|
||||||
35
examples/RTStress/raygen.wgsl
Normal file
35
examples/RTStress/raygen.wgsl
Normal file
|
|
@ -0,0 +1,35 @@
|
||||||
|
// RTStress raygen (runs in GENERATE). Host-driven pinhole camera at
|
||||||
|
// @group(3) (groups 0..2 are reserved by the wavefront pipeline:
|
||||||
|
// 0 = WfParams, 1 = data heaps, 2 = indirect args).
|
||||||
|
struct Camera {
|
||||||
|
origin: vec3<f32>,
|
||||||
|
pad0: f32,
|
||||||
|
right: vec3<f32>,
|
||||||
|
tanHalf: f32,
|
||||||
|
up: vec3<f32>,
|
||||||
|
aspect: f32,
|
||||||
|
forward: vec3<f32>,
|
||||||
|
pad1: f32,
|
||||||
|
};
|
||||||
|
@group(3) @binding(0) var<storage, read> camera : Camera;
|
||||||
|
|
||||||
|
fn raygen_main(gid: vec3<u32>) {
|
||||||
|
if (gid.x >= wfParams.surfaceW || gid.y >= wfParams.surfaceH) { return; }
|
||||||
|
|
||||||
|
let pixelf = vec2<f32>(f32(gid.x), f32(gid.y));
|
||||||
|
let res = vec2<f32>(f32(wfParams.surfaceW), f32(wfParams.surfaceH));
|
||||||
|
let uv = (pixelf + vec2<f32>(0.5)) / res;
|
||||||
|
let ndc = uv * 2.0 - vec2<f32>(1.0);
|
||||||
|
|
||||||
|
let direction = normalize(
|
||||||
|
camera.right * (ndc.x * camera.aspect * camera.tanHalf) +
|
||||||
|
camera.up * (-ndc.y * camera.tanHalf) +
|
||||||
|
camera.forward);
|
||||||
|
|
||||||
|
var p: Payload;
|
||||||
|
p.color = vec3<f32>(0.0);
|
||||||
|
p.shadowRay = 0u;
|
||||||
|
|
||||||
|
rtEmitPrimaryRay(camera.origin, 0.01, direction, 100000.0,
|
||||||
|
0u, 0xFFu, 0u, 0u, p);
|
||||||
|
}
|
||||||
7
examples/RTStress/resolve.wgsl
Normal file
7
examples/RTStress/resolve.wgsl
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
// RTStress RESOLVE-stage tonemap: Reinhard + gamma 2.2 over the linear
|
||||||
|
// accumulator. Registered as a WebGPURTStage::Resolve shader.
|
||||||
|
fn resolve_main(coord: vec2<u32>, hdr: vec4<f32>) -> vec4<f32> {
|
||||||
|
let mapped = hdr.rgb / (hdr.rgb + vec3<f32>(1.0));
|
||||||
|
let g = pow(mapped, vec3<f32>(1.0 / 2.2));
|
||||||
|
return vec4<f32>(g, 1.0);
|
||||||
|
}
|
||||||
|
|
@ -1,35 +1,25 @@
|
||||||
// Payload declared here so the WGSL assembler sees it before raygen
|
// Sponza closest-hit (runs in SHADE). In the wavefront model the lighting
|
||||||
// (the assembler concatenates closesthit/anyhit/miss BEFORE raygen).
|
// + shadow trace that used to live in raygen happens here: gather surface
|
||||||
|
// data, accumulate ambient, and emit a shadow ray toward the sun carrying
|
||||||
|
// the pending direct contribution. The shadow ray's miss adds that
|
||||||
|
// contribution (sun visible); its hit adds nothing (occluded), since
|
||||||
|
// RT_FLAG_SKIP_CLOSEST_HIT suppresses closesthit on the shadow ray.
|
||||||
//
|
//
|
||||||
// WGSL forbids cycles in the function call graph, so closesthit_main
|
// Payload declared here so the assembler sees it before wfPayload / SHADE.
|
||||||
// CAN'T call traceRay (that would create closesthit → traceRay →
|
|
||||||
// runClosestHit → closesthit). The lighting + shadow trace therefore
|
|
||||||
// happens in raygen; closesthit's job is just to gather surface data
|
|
||||||
// into the payload.
|
|
||||||
//
|
|
||||||
// shadowRay = 0 (primary): closesthit fills albedo/worldPos/normal/hit.
|
|
||||||
// shadowRay = 1 (shadow): closesthit is skipped (RT_FLAG_SKIP_CLOSEST_HIT),
|
|
||||||
// miss flips color to white = "lit".
|
|
||||||
struct Payload {
|
struct Payload {
|
||||||
color: vec3<f32>,
|
color: vec3<f32>, // shadow ray: pending albedo·sun·nDotL
|
||||||
shadowRay: u32,
|
shadowRay: u32, // 0 primary, 1 shadow
|
||||||
worldPos: vec3<f32>,
|
|
||||||
hit: u32,
|
|
||||||
worldNormal: vec3<f32>,
|
|
||||||
_pad: f32,
|
|
||||||
};
|
};
|
||||||
|
|
||||||
// User-bound resources at group(2). Matches the UICustomBinding span the
|
// User resources at @group(3) (0..2 are the wavefront pipeline's reserved
|
||||||
// host hands to PipelineRTWebGPU::Init.
|
// groups). binding 0 albedo array, 1 sampler, 2 camera (raygen only).
|
||||||
// binding 0 — albedo texture_2d_array, one layer per Sponza material
|
@group(3) @binding(0) var albedos : texture_2d_array<f32>;
|
||||||
// binding 1 — sampler (linear clamp)
|
@group(3) @binding(1) var samp : sampler;
|
||||||
// binding 2 — camera storage buffer (read by raygen only)
|
|
||||||
@group(2) @binding(0) var albedos : texture_2d_array<f32>;
|
const SUN_DIR_TO_LIGHT: vec3<f32> = vec3<f32>(-0.35, 1.00, -0.20);
|
||||||
@group(2) @binding(1) var samp : sampler;
|
const SUN_COLOR: vec3<f32> = vec3<f32>( 1.10, 1.00, 0.85);
|
||||||
|
const AMBIENT_COLOR: vec3<f32> = vec3<f32>( 0.18, 0.20, 0.28);
|
||||||
|
|
||||||
// VertexNormalTangentUVPacked is `packed` on the outer struct but each
|
|
||||||
// inner `Vector<float, N, 4>` is SIMD-aligned to a 16-byte stride. So
|
|
||||||
// each vertex is 12 u32 words: normal at 0..2, tangent at 4..6, uv at 8..9.
|
|
||||||
const ATTRIB_STRIDE_U32: u32 = 12u;
|
const ATTRIB_STRIDE_U32: u32 = 12u;
|
||||||
const ATTRIB_NORMAL_OFFSET: u32 = 0u;
|
const ATTRIB_NORMAL_OFFSET: u32 = 0u;
|
||||||
const ATTRIB_UV_OFFSET: u32 = 8u;
|
const ATTRIB_UV_OFFSET: u32 = 8u;
|
||||||
|
|
@ -52,7 +42,6 @@ fn fetchNormal(meshRec: MeshRecord, vertexIdx: u32) -> vec3<f32> {
|
||||||
}
|
}
|
||||||
|
|
||||||
fn closesthit_main(ray: RayDesc, hit: HitInfo, payload: ptr<function, Payload>) {
|
fn closesthit_main(ray: RayDesc, hit: HitInfo, payload: ptr<function, Payload>) {
|
||||||
// Resolve hit triangle → 3 vertex indices.
|
|
||||||
let meshIdx = tlasEntries[hit.instanceId].blasMeshIdx;
|
let meshIdx = tlasEntries[hit.instanceId].blasMeshIdx;
|
||||||
let meshRec = meshRecords[meshIdx];
|
let meshRec = meshRecords[meshIdx];
|
||||||
let baseIdx = meshRec.indexOffset + hit.primitiveId * 3u;
|
let baseIdx = meshRec.indexOffset + hit.primitiveId * 3u;
|
||||||
|
|
@ -61,19 +50,14 @@ fn closesthit_main(ray: RayDesc, hit: HitInfo, payload: ptr<function, Payload>)
|
||||||
let i2 = indices[baseIdx + 2u];
|
let i2 = indices[baseIdx + 2u];
|
||||||
let bary = vec3<f32>(1.0 - hit.attribs.x - hit.attribs.y, hit.attribs.x, hit.attribs.y);
|
let bary = vec3<f32>(1.0 - hit.attribs.x - hit.attribs.y, hit.attribs.x, hit.attribs.y);
|
||||||
|
|
||||||
// Albedo via barycentric UV interpolation.
|
|
||||||
let uv0 = fetchUV(meshRec, i0);
|
let uv0 = fetchUV(meshRec, i0);
|
||||||
let uv1 = fetchUV(meshRec, i1);
|
let uv1 = fetchUV(meshRec, i1);
|
||||||
let uv2 = fetchUV(meshRec, i2);
|
let uv2 = fetchUV(meshRec, i2);
|
||||||
let uv = uv0 * bary.x + uv1 * bary.y + uv2 * bary.z;
|
let uv = uv0 * bary.x + uv1 * bary.y + uv2 * bary.z;
|
||||||
// OBJ V is bottom-up; sampler is top-down. fract for manual tiling.
|
|
||||||
let uvTiled = vec2<f32>(fract(uv.x), fract(1.0 - uv.y));
|
let uvTiled = vec2<f32>(fract(uv.x), fract(1.0 - uv.y));
|
||||||
let layer = i32(hit.customIndex);
|
let layer = i32(hit.customIndex);
|
||||||
let albedo = textureSampleLevel(albedos, samp, uvTiled, layer, 0.0).rgb;
|
let albedo = textureSampleLevel(albedos, samp, uvTiled, layer, 0.0).rgb;
|
||||||
|
|
||||||
// World-space smooth shading normal. Multiply through the
|
|
||||||
// object-to-world rotation so this stays correct if a future scene
|
|
||||||
// rotates instances (Sponza itself is all identities).
|
|
||||||
let n0 = fetchNormal(meshRec, i0);
|
let n0 = fetchNormal(meshRec, i0);
|
||||||
let n1 = fetchNormal(meshRec, i1);
|
let n1 = fetchNormal(meshRec, i1);
|
||||||
let n2 = fetchNormal(meshRec, i2);
|
let n2 = fetchNormal(meshRec, i2);
|
||||||
|
|
@ -83,8 +67,23 @@ fn closesthit_main(ray: RayDesc, hit: HitInfo, payload: ptr<function, Payload>)
|
||||||
dot(hit.objectToWorldR1.xyz, nObj),
|
dot(hit.objectToWorldR1.xyz, nObj),
|
||||||
dot(hit.objectToWorldR2.xyz, nObj)));
|
dot(hit.objectToWorldR2.xyz, nObj)));
|
||||||
|
|
||||||
(*payload).color = albedo;
|
// Two-sided: flip the normal toward the camera (Sponza curtains have
|
||||||
(*payload).worldPos = ray.origin + ray.direction * hit.t;
|
// inconsistent winding).
|
||||||
(*payload).worldNormal = nWorld;
|
let nFacing = select(-nWorld, nWorld, dot(nWorld, ray.direction) < 0.0);
|
||||||
(*payload).hit = 1u;
|
let lightDir = normalize(SUN_DIR_TO_LIGHT);
|
||||||
|
let nDotL = max(0.0, dot(nFacing, lightDir));
|
||||||
|
let worldPos = ray.origin + ray.direction * hit.t;
|
||||||
|
|
||||||
|
// Ambient is unconditional; direct light is gated behind the shadow ray.
|
||||||
|
rtAccumulate(albedo * AMBIENT_COLOR);
|
||||||
|
|
||||||
|
if (nDotL > 0.0) {
|
||||||
|
let shadowOrigin = worldPos + nFacing * 0.5;
|
||||||
|
var sp: Payload;
|
||||||
|
sp.color = albedo * SUN_COLOR * nDotL;
|
||||||
|
sp.shadowRay = 1u;
|
||||||
|
rtEmitRay(shadowOrigin, 0.001, lightDir, 10000.0,
|
||||||
|
RT_FLAG_SKIP_CLOSEST_HIT | RT_FLAG_TERMINATE_ON_FIRST_HIT,
|
||||||
|
0xFFu, 0u, 0u, sp);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -253,10 +253,11 @@ int main() {
|
||||||
DescriptorHeapWebGPU heap;
|
DescriptorHeapWebGPU heap;
|
||||||
heap.Initialize(/*images*/ 2, /*buffers*/ 2, /*samplers*/ 2);
|
heap.Initialize(/*images*/ 2, /*buffers*/ 2, /*samplers*/ 2);
|
||||||
|
|
||||||
std::array<WebGPUShader, 3> shaders {{
|
std::array<WebGPUShader, 4> shaders {{
|
||||||
WebGPUShader(fs::path("raygen.wgsl"), "raygen_main", WebGPURTStage::Raygen),
|
WebGPUShader(fs::path("raygen.wgsl"), "raygen_main", WebGPURTStage::Raygen),
|
||||||
WebGPUShader(fs::path("miss.wgsl"), "miss_main", WebGPURTStage::Miss),
|
WebGPUShader(fs::path("miss.wgsl"), "miss_main", WebGPURTStage::Miss),
|
||||||
WebGPUShader(fs::path("closesthit.wgsl"), "closesthit_main", WebGPURTStage::ClosestHit),
|
WebGPUShader(fs::path("closesthit.wgsl"), "closesthit_main", WebGPURTStage::ClosestHit),
|
||||||
|
WebGPUShader(fs::path("resolve.wgsl"), "resolve_main", WebGPURTStage::Resolve),
|
||||||
}};
|
}};
|
||||||
ShaderBindingTableWebGPU sbt;
|
ShaderBindingTableWebGPU sbt;
|
||||||
sbt.Init(shaders);
|
sbt.Init(shaders);
|
||||||
|
|
@ -271,14 +272,15 @@ int main() {
|
||||||
{ .type = RTShaderGroupType::TrianglesHitGroup, .closestHitShader = 2 },
|
{ .type = RTShaderGroupType::TrianglesHitGroup, .closestHitShader = 2 },
|
||||||
}};
|
}};
|
||||||
|
|
||||||
// Three user bindings at @group(2):
|
// Three user bindings at @group(3) (the wavefront pipeline reserves
|
||||||
|
// groups 0..2 for WfParams / data heaps / indirect args):
|
||||||
// binding 0 — albedo texture_2d_array (one layer per material)
|
// binding 0 — albedo texture_2d_array (one layer per material)
|
||||||
// binding 1 — sampler (linear clamp)
|
// binding 1 — sampler (linear clamp)
|
||||||
// binding 2 — Camera storage buffer (host-driven, updated per frame)
|
// binding 2 — Camera storage buffer (host-driven, updated per frame)
|
||||||
std::array<UICustomBinding, 3> bindings {{
|
std::array<UICustomBinding, 3> bindings {{
|
||||||
{ .group = 2, .binding = 0, .kind = UICustomBindingKind::SampledTextureArray, ._pad = 0, .pushOffset = 0 },
|
{ .group = 3, .binding = 0, .kind = UICustomBindingKind::SampledTextureArray, ._pad = 0, .pushOffset = 0 },
|
||||||
{ .group = 2, .binding = 1, .kind = UICustomBindingKind::Sampler, ._pad = 0, .pushOffset = 0 },
|
{ .group = 3, .binding = 1, .kind = UICustomBindingKind::Sampler, ._pad = 0, .pushOffset = 0 },
|
||||||
{ .group = 2, .binding = 2, .kind = UICustomBindingKind::Buffer, ._pad = 0, .pushOffset = 0 },
|
{ .group = 3, .binding = 2, .kind = UICustomBindingKind::Buffer, ._pad = 0, .pushOffset = 0 },
|
||||||
}};
|
}};
|
||||||
|
|
||||||
PipelineRTWebGPU pipeline;
|
PipelineRTWebGPU pipeline;
|
||||||
|
|
@ -367,6 +369,7 @@ int main() {
|
||||||
RTPass rtPass(&pipeline);
|
RTPass rtPass(&pipeline);
|
||||||
rtPass.handlesPtr = userHandles.data();
|
rtPass.handlesPtr = userHandles.data();
|
||||||
rtPass.handlesCount = static_cast<std::uint32_t>(userHandles.size());
|
rtPass.handlesCount = static_cast<std::uint32_t>(userHandles.size());
|
||||||
|
rtPass.maxDepth = 2; // primary + shadow
|
||||||
window.passes.push_back(&rtPass);
|
window.passes.push_back(&rtPass);
|
||||||
|
|
||||||
// ── Free camera: WASD + mouse-delta look ───────────────────────────
|
// ── Free camera: WASD + mouse-delta look ───────────────────────────
|
||||||
|
|
@ -375,9 +378,10 @@ int main() {
|
||||||
// height, looking +X down the long axis (bbox: X[-1921..1800],
|
// height, looking +X down the long axis (bbox: X[-1921..1800],
|
||||||
// Y[-126..1429], Z[-1183..1105]). The user can fine-tune from there.
|
// Y[-126..1429], Z[-1183..1105]). The user can fine-tune from there.
|
||||||
struct CamState {
|
struct CamState {
|
||||||
Vector<float, 3, 4> position{ -1500.0f, 200.0f, 0.0f };
|
// 3/4 view from a corner aimed at the atrium centre.
|
||||||
float yaw = 0.0f; // radians, around world +Y
|
Vector<float, 3, 4> position{ -1400.0f, 700.0f, -600.0f };
|
||||||
float pitch = 0.0f; // radians, +pitch looks up
|
float yaw = 0.405f; // radians, around world +Y
|
||||||
|
float pitch = -0.317f; // radians, +pitch looks up
|
||||||
} cam;
|
} cam;
|
||||||
|
|
||||||
Input::Map inputMap;
|
Input::Map inputMap;
|
||||||
|
|
|
||||||
|
|
@ -1,16 +1,12 @@
|
||||||
|
// Sponza miss (runs in SHADE). Primary miss → two-stop sky gradient.
|
||||||
|
// Shadow miss → the sun is unoccluded, so add the pending direct term.
|
||||||
fn miss_main(ray: RayDesc, payload: ptr<function, Payload>) {
|
fn miss_main(ray: RayDesc, payload: ptr<function, Payload>) {
|
||||||
if ((*payload).shadowRay == 1u) {
|
if ((*payload).shadowRay == 1u) {
|
||||||
// Shadow ray escaped to infinity — the sun is visible from the
|
rtAccumulate((*payload).color);
|
||||||
// origin, so the surface there should pick up full direct light.
|
|
||||||
// raygen reads color.x as the visibility coefficient.
|
|
||||||
(*payload).color = vec3<f32>(1.0);
|
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Primary miss: cheap two-stop sky gradient. (*payload).hit stays 0
|
|
||||||
// so raygen knows to skip the lighting path and just use this color.
|
|
||||||
let t = clamp(ray.direction.y * 0.5 + 0.5, 0.0, 1.0);
|
let t = clamp(ray.direction.y * 0.5 + 0.5, 0.0, 1.0);
|
||||||
let sky = vec3<f32>(0.45, 0.65, 0.95);
|
let sky = vec3<f32>(0.45, 0.65, 0.95);
|
||||||
let zenith = vec3<f32>(0.95, 0.85, 0.65);
|
let zenith = vec3<f32>(0.95, 0.85, 0.65);
|
||||||
(*payload).color = mix(sky, zenith, t);
|
rtAccumulate(mix(sky, zenith, t));
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -82,6 +82,7 @@ extern "C" Configuration CrafterBuildProject(std::span<const std::string_view> a
|
||||||
cfg.files.emplace_back(fs::path("raygen.wgsl"));
|
cfg.files.emplace_back(fs::path("raygen.wgsl"));
|
||||||
cfg.files.emplace_back(fs::path("closesthit.wgsl"));
|
cfg.files.emplace_back(fs::path("closesthit.wgsl"));
|
||||||
cfg.files.emplace_back(fs::path("miss.wgsl"));
|
cfg.files.emplace_back(fs::path("miss.wgsl"));
|
||||||
|
cfg.files.emplace_back(fs::path("resolve.wgsl"));
|
||||||
EnableWasiBrowserRuntime(cfg);
|
EnableWasiBrowserRuntime(cfg);
|
||||||
} else {
|
} else {
|
||||||
cfg.shaders.emplace_back(fs::path("raygen.glsl"), std::string("main"), ShaderType::RayGen);
|
cfg.shaders.emplace_back(fs::path("raygen.glsl"), std::string("main"), ShaderType::RayGen);
|
||||||
|
|
|
||||||
|
|
@ -1,12 +1,8 @@
|
||||||
// WebGPU raygen. Camera state comes from the host every frame via a
|
// Sponza raygen (runs in GENERATE). Emits the pixel's primary ray; all
|
||||||
// storage buffer bound at @group(2) @binding(2); main.cpp drives that
|
// shading + the shadow trace now happen in SHADE (closesthit/miss). Camera
|
||||||
// from WASD + mouse-delta through Crafter::Input.
|
// state comes from the host each frame via a storage buffer at
|
||||||
//
|
// @group(3) @binding(2) (groups 0..2 are reserved by the wavefront
|
||||||
// The shading + shadow trace all happens here because WGSL forbids
|
// pipeline). main.cpp drives it from WASD + mouse-delta.
|
||||||
// recursive function call graphs — closesthit_main can't call traceRay
|
|
||||||
// (that would loop closesthit → traceRay → runClosestHit → closesthit).
|
|
||||||
// Raygen is the entry point and not called by anyone, so it can call
|
|
||||||
// traceRay twice (once primary, once shadow) without forming a cycle.
|
|
||||||
|
|
||||||
struct Camera {
|
struct Camera {
|
||||||
origin: vec3<f32>,
|
origin: vec3<f32>,
|
||||||
|
|
@ -18,92 +14,25 @@ struct Camera {
|
||||||
forward: vec3<f32>,
|
forward: vec3<f32>,
|
||||||
pad1: f32,
|
pad1: f32,
|
||||||
};
|
};
|
||||||
@group(2) @binding(2) var<storage, read> camera : Camera;
|
@group(3) @binding(2) var<storage, read> camera : Camera;
|
||||||
|
|
||||||
// Sun coming through Sponza's open roof. Y is up; this points "down and
|
|
||||||
// slightly along +X" so the light grazes the colonnades on one side.
|
|
||||||
const SUN_DIR_TO_LIGHT: vec3<f32> = vec3<f32>(-0.35, 1.00, -0.20);
|
|
||||||
const SUN_COLOR: vec3<f32> = vec3<f32>( 1.10, 1.00, 0.85);
|
|
||||||
const AMBIENT_COLOR: vec3<f32> = vec3<f32>( 0.18, 0.20, 0.28);
|
|
||||||
|
|
||||||
fn raygen_main(gid: vec3<u32>) {
|
fn raygen_main(gid: vec3<u32>) {
|
||||||
if (gid.x >= hdr.surfaceW || gid.y >= hdr.surfaceH) { return; }
|
if (gid.x >= wfParams.surfaceW || gid.y >= wfParams.surfaceH) { return; }
|
||||||
|
|
||||||
let pixel = vec2<f32>(f32(gid.x), f32(gid.y));
|
let pixel = vec2<f32>(f32(gid.x), f32(gid.y));
|
||||||
let resolution = vec2<f32>(f32(hdr.surfaceW), f32(hdr.surfaceH));
|
let resolution = vec2<f32>(f32(wfParams.surfaceW), f32(wfParams.surfaceH));
|
||||||
let uv = (pixel + vec2<f32>(0.5)) / resolution;
|
let uv = (pixel + vec2<f32>(0.5)) / resolution;
|
||||||
let ndc = uv * 2.0 - vec2<f32>(1.0);
|
let ndc = uv * 2.0 - vec2<f32>(1.0);
|
||||||
|
|
||||||
// Pinhole camera reconstructed from the host basis. ndc.x runs left-
|
|
||||||
// to-right across the screen → +right; ndc.y is top-down so we
|
|
||||||
// negate before applying +up.
|
|
||||||
let direction = normalize(
|
let direction = normalize(
|
||||||
camera.right * (ndc.x * camera.aspect * camera.tanHalf) +
|
camera.right * (ndc.x * camera.aspect * camera.tanHalf) +
|
||||||
camera.up * (-ndc.y * camera.tanHalf) +
|
camera.up * (-ndc.y * camera.tanHalf) +
|
||||||
camera.forward);
|
camera.forward);
|
||||||
|
|
||||||
// ── Primary ray ────────────────────────────────────────────────────
|
|
||||||
var payload: Payload;
|
var payload: Payload;
|
||||||
payload.color = vec3<f32>(0.0);
|
payload.color = vec3<f32>(0.0);
|
||||||
payload.shadowRay = 0u;
|
payload.shadowRay = 0u;
|
||||||
payload.hit = 0u;
|
|
||||||
|
|
||||||
traceRay(
|
rtEmitPrimaryRay(camera.origin, 0.001, direction, 10000.0,
|
||||||
0u, 0u, 0xFFu,
|
0u, 0xFFu, 0u, 0u, payload);
|
||||||
0u, 0u, 0u,
|
|
||||||
camera.origin, 0.001,
|
|
||||||
direction, 10000.0,
|
|
||||||
&payload);
|
|
||||||
|
|
||||||
var finalColor: vec3<f32>;
|
|
||||||
if (payload.hit == 1u) {
|
|
||||||
// Closesthit filled albedo/worldPos/worldNormal. Two-sided
|
|
||||||
// shading: flip the normal toward the camera if we hit the back
|
|
||||||
// face — Sponza's curtains in particular have inconsistent
|
|
||||||
// winding, and without this half the surface would go black.
|
|
||||||
let albedo = payload.color;
|
|
||||||
let nFacing = select(-payload.worldNormal,
|
|
||||||
payload.worldNormal,
|
|
||||||
dot(payload.worldNormal, direction) < 0.0);
|
|
||||||
let lightDir = normalize(SUN_DIR_TO_LIGHT);
|
|
||||||
let nDotL = max(0.0, dot(nFacing, lightDir));
|
|
||||||
|
|
||||||
// ── Shadow ray ────────────────────────────────────────────────
|
|
||||||
// Only worth tracing if the surface faces the sun at all.
|
|
||||||
var visibility = 0.0;
|
|
||||||
if (nDotL > 0.0) {
|
|
||||||
// Normal-offset bias on Sponza's units (~3700 wide atrium)
|
|
||||||
// is hefty; 0.5 keeps the shadow ray clear of the originating
|
|
||||||
// triangle without producing visible "floating" shadows.
|
|
||||||
let shadowOrigin = payload.worldPos + nFacing * 0.5;
|
|
||||||
|
|
||||||
var shadowPayload: Payload;
|
|
||||||
shadowPayload.color = vec3<f32>(0.0); // default: blocked
|
|
||||||
shadowPayload.shadowRay = 1u;
|
|
||||||
shadowPayload.hit = 0u;
|
|
||||||
traceRay(
|
|
||||||
0u,
|
|
||||||
RT_FLAG_SKIP_CLOSEST_HIT | RT_FLAG_TERMINATE_ON_FIRST_HIT,
|
|
||||||
0xFFu,
|
|
||||||
0u, 0u, 0u,
|
|
||||||
shadowOrigin, 0.001,
|
|
||||||
lightDir, 10000.0,
|
|
||||||
&shadowPayload);
|
|
||||||
visibility = shadowPayload.color.x;
|
|
||||||
}
|
|
||||||
|
|
||||||
let lit = AMBIENT_COLOR + SUN_COLOR * (nDotL * visibility);
|
|
||||||
finalColor = albedo * lit;
|
|
||||||
} else {
|
|
||||||
// Sky color was filled by miss_main.
|
|
||||||
finalColor = payload.color;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Reinhard tonemap + gamma 2.2 so sun-lit albedos don't clip and
|
|
||||||
// shadow detail stays readable.
|
|
||||||
let mapped = finalColor / (finalColor + vec3<f32>(1.0));
|
|
||||||
let gamma = pow(mapped, vec3<f32>(1.0 / 2.2));
|
|
||||||
textureStore(outImage,
|
|
||||||
vec2<i32>(i32(gid.x), i32(gid.y)),
|
|
||||||
vec4<f32>(gamma, 1.0));
|
|
||||||
}
|
}
|
||||||
|
|
|
||||||
7
examples/Sponza/resolve.wgsl
Normal file
7
examples/Sponza/resolve.wgsl
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
// Sponza RESOLVE-stage tonemap: Reinhard + gamma 2.2 over the linear
|
||||||
|
// accumulator — matches the tonemap the megakernel raygen applied inline.
|
||||||
|
fn resolve_main(coord: vec2<u32>, hdr: vec4<f32>) -> vec4<f32> {
|
||||||
|
let mapped = hdr.rgb / (hdr.rgb + vec3<f32>(1.0));
|
||||||
|
let g = pow(mapped, vec3<f32>(1.0 / 2.2));
|
||||||
|
return vec4<f32>(g, 1.0);
|
||||||
|
}
|
||||||
|
|
@ -1,6 +1,9 @@
|
||||||
// WebGPU port of closesthit.glsl. Library concatenates this BEFORE the
|
// Payload declared here so the WGSL assembler sees it before the wfPayload
|
||||||
// library helpers, so `Payload` declared here is visible to traceRay,
|
// binding, the SHADE dispatch, and the raygen source.
|
||||||
// runClosestHit, the mega-switch, and the user's raygen source.
|
//
|
||||||
|
// Wavefront model: closesthit_main runs in SHADE and accumulates the
|
||||||
|
// pixel's color directly (rtAccumulate) instead of writing a payload that
|
||||||
|
// raygen reads back.
|
||||||
|
|
||||||
struct Payload {
|
struct Payload {
|
||||||
color: vec3<f32>,
|
color: vec3<f32>,
|
||||||
|
|
@ -8,5 +11,5 @@ struct Payload {
|
||||||
|
|
||||||
fn closesthit_main(ray: RayDesc, hit: HitInfo, payload: ptr<function, Payload>) {
|
fn closesthit_main(ray: RayDesc, hit: HitInfo, payload: ptr<function, Payload>) {
|
||||||
let bary = vec3<f32>(1.0 - hit.attribs.x - hit.attribs.y, hit.attribs.x, hit.attribs.y);
|
let bary = vec3<f32>(1.0 - hit.attribs.x - hit.attribs.y, hit.attribs.x, hit.attribs.y);
|
||||||
(*payload).color = bary;
|
rtAccumulate(bary);
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,6 @@
|
||||||
// WebGPU port of miss.glsl.
|
// Wavefront miss: runs in SHADE for rays that hit nothing. Accumulate the
|
||||||
|
// white background directly.
|
||||||
|
|
||||||
fn miss_main(ray: RayDesc, payload: ptr<function, Payload>) {
|
fn miss_main(ray: RayDesc, payload: ptr<function, Payload>) {
|
||||||
(*payload).color = vec3<f32>(1.0, 1.0, 1.0);
|
rtAccumulate(vec3<f32>(1.0, 1.0, 1.0));
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1,11 +1,12 @@
|
||||||
// WebGPU port of raygen.glsl. Mirrors the pinhole camera setup — the
|
// WebGPU wavefront raygen. Runs in GENERATE: compute the pinhole camera
|
||||||
// Payload type is declared in closesthit.wgsl (concatenated earlier).
|
// ray and emit it as the pixel's primary ray. Shading happens later in
|
||||||
|
// SHADE (closesthit/miss). The Payload type is declared in closesthit.wgsl.
|
||||||
|
|
||||||
fn raygen_main(gid: vec3<u32>) {
|
fn raygen_main(gid: vec3<u32>) {
|
||||||
if (gid.x >= hdr.surfaceW || gid.y >= hdr.surfaceH) { return; }
|
if (gid.x >= wfParams.surfaceW || gid.y >= wfParams.surfaceH) { return; }
|
||||||
|
|
||||||
let pixel = vec2<f32>(f32(gid.x), f32(gid.y));
|
let pixel = vec2<f32>(f32(gid.x), f32(gid.y));
|
||||||
let resolution = vec2<f32>(f32(hdr.surfaceW), f32(hdr.surfaceH));
|
let resolution = vec2<f32>(f32(wfParams.surfaceW), f32(wfParams.surfaceH));
|
||||||
let uv = (pixel + vec2<f32>(0.5)) / resolution;
|
let uv = (pixel + vec2<f32>(0.5)) / resolution;
|
||||||
let ndc = uv * 2.0 - vec2<f32>(1.0);
|
let ndc = uv * 2.0 - vec2<f32>(1.0);
|
||||||
|
|
||||||
|
|
@ -23,17 +24,11 @@ fn raygen_main(gid: vec3<u32>) {
|
||||||
var payload: Payload;
|
var payload: Payload;
|
||||||
payload.color = vec3<f32>(0.0);
|
payload.color = vec3<f32>(0.0);
|
||||||
|
|
||||||
traceRay(
|
rtEmitPrimaryRay(
|
||||||
0u, // tlasIdx (unused)
|
|
||||||
0u, // ray flags
|
|
||||||
0xFFu, // cull mask
|
|
||||||
0u, 0u, 0u, // sbtRecordOffset, sbtRecordStride, missIndex
|
|
||||||
origin, 0.001,
|
origin, 0.001,
|
||||||
direction, 10000.0,
|
direction, 10000.0,
|
||||||
&payload,
|
0u, // ray flags
|
||||||
);
|
0xFFu, // cull mask
|
||||||
|
0u, 0u, // sbtRecordOffset, missIndex
|
||||||
textureStore(outImage,
|
payload);
|
||||||
vec2<i32>(i32(gid.x), i32(gid.y)),
|
|
||||||
vec4<f32>(payload.color, 1.0));
|
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -78,13 +78,22 @@ void PipelineRTWebGPU::Init(WebGPUCommandEncoderRef /*cmd*/,
|
||||||
// shaders by stage. Concatenating *all* non-raygen sources here lets
|
// shaders by stage. Concatenating *all* non-raygen sources here lets
|
||||||
// them declare shared helpers, `struct Payload`, etc., in any order.
|
// them declare shared helpers, `struct Payload`, etc., in any order.
|
||||||
|
|
||||||
wgsl += "// ── user closesthit / anyhit / miss sources ───────────────\n";
|
wgsl += "// ── user closesthit / anyhit / miss / resolve sources ─────\n";
|
||||||
for (const auto& shader : sbt.shaders) {
|
for (const auto& shader : sbt.shaders) {
|
||||||
if (shader.stage == WebGPURTStage::Raygen) continue;
|
if (shader.stage == WebGPURTStage::Raygen) continue;
|
||||||
wgsl += shader.source;
|
wgsl += shader.source;
|
||||||
wgsl += "\n";
|
wgsl += "\n";
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ── Payload-typed wavefront storage binding ────────────────────────
|
||||||
|
//
|
||||||
|
// Emitted *after* the user sources so it can name the user's `Payload`
|
||||||
|
// type. Holds one Payload per in-flight ray slot across both ping/pong
|
||||||
|
// ray buffers (capacity = 2·W·H). SHADE loads ray.payloadSlot here;
|
||||||
|
// emit helpers (rtEmitPrimaryRay / rtEmitRay) store into it.
|
||||||
|
wgsl += "\n@group(1) @binding(15) var<storage, read_write> "
|
||||||
|
"wfPayload : array<Payload>;\n";
|
||||||
|
|
||||||
// ── Section 2: mega-switch dispatchers ─────────────────────────────
|
// ── Section 2: mega-switch dispatchers ─────────────────────────────
|
||||||
//
|
//
|
||||||
// runClosestHit, runAnyHit, runMiss each dispatch on the per-hit /
|
// runClosestHit, runAnyHit, runMiss each dispatch on the per-hit /
|
||||||
|
|
@ -141,6 +150,24 @@ void PipelineRTWebGPU::Init(WebGPUCommandEncoderRef /*cmd*/,
|
||||||
wgsl += " }\n";
|
wgsl += " }\n";
|
||||||
wgsl += "}\n";
|
wgsl += "}\n";
|
||||||
|
|
||||||
|
// runResolve — RESOLVE-stage tonemap hook. The first registered
|
||||||
|
// Resolve shader wins; with none, identity passthrough (alpha forced
|
||||||
|
// to 1) so the wavefront output matches a megakernel that wrote raw
|
||||||
|
// colors.
|
||||||
|
std::string resolveEntryFn;
|
||||||
|
for (const auto& shader : sbt.shaders) {
|
||||||
|
if (shader.stage == WebGPURTStage::Resolve) { resolveEntryFn = shader.entryFn; break; }
|
||||||
|
}
|
||||||
|
wgsl += "\nfn runResolve(coord: vec2<u32>, hdr: vec4<f32>) -> vec4<f32> {\n";
|
||||||
|
if (!resolveEntryFn.empty()) {
|
||||||
|
wgsl += " return ";
|
||||||
|
wgsl += resolveEntryFn;
|
||||||
|
wgsl += "(coord, hdr);\n";
|
||||||
|
} else {
|
||||||
|
wgsl += " return vec4<f32>(hdr.rgb, 1.0);\n";
|
||||||
|
}
|
||||||
|
wgsl += "}\n";
|
||||||
|
|
||||||
// Marker — JS-side prelude/post-amble searches for this token to know
|
// Marker — JS-side prelude/post-amble searches for this token to know
|
||||||
// where the library helpers (traverseBlas/traverseTlas/traceRay) get
|
// where the library helpers (traverseBlas/traverseTlas/traceRay) get
|
||||||
// injected, followed by raygen sources and the @compute entry point.
|
// injected, followed by raygen sources and the @compute entry point.
|
||||||
|
|
@ -173,17 +200,55 @@ void PipelineRTWebGPU::Init(WebGPUCommandEncoderRef /*cmd*/,
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// ── Section 4: @compute entry point ────────────────────────────────
|
// ── Section 4: wavefront @compute entry points ─────────────────────
|
||||||
//
|
//
|
||||||
// 8x8 tile workgroup matching the rest of the WebGPU backend.
|
// Five kernels share this one module; createComputePipeline selects
|
||||||
|
// each by entryPoint name. GENERATE/RESOLVE are 8x8 screen tiles;
|
||||||
|
// TRACE/SHADE are 64-wide 1-D over the compacted ray list (dispatched
|
||||||
|
// indirectly from PREP); PREP is a single thread. The library helper
|
||||||
|
// bodies (_rtwTraverseTlas, rtEmit*, rtAccumulate, _wfCurCount, …) are
|
||||||
|
// injected JS-side at the marker above.
|
||||||
|
|
||||||
|
// GENERATE — one thread per pixel; clears the pixel's accumulator and
|
||||||
|
// runs the user raygen, which calls rtEmitPrimaryRay.
|
||||||
wgsl += "\n@compute @workgroup_size(8, 8, 1)\n";
|
wgsl += "\n@compute @workgroup_size(8, 8, 1)\n";
|
||||||
wgsl += "fn main(@builtin(global_invocation_id) gid: vec3<u32>) {\n";
|
wgsl += "fn wfGenerate(@builtin(global_invocation_id) gid: vec3<u32>) {\n";
|
||||||
|
wgsl += " if (gid.x >= wfParams.surfaceW || gid.y >= wfParams.surfaceH) { return; }\n";
|
||||||
|
wgsl += " let pixel = gid.y * wfParams.surfaceW + gid.x;\n";
|
||||||
|
wgsl += " wfAccum[pixel] = vec4<f32>(0.0, 0.0, 0.0, 0.0);\n";
|
||||||
|
wgsl += " _wfPixel = pixel;\n";
|
||||||
wgsl += " ";
|
wgsl += " ";
|
||||||
wgsl += raygenEntryFn;
|
wgsl += raygenEntryFn;
|
||||||
wgsl += "(gid);\n";
|
wgsl += "(gid);\n";
|
||||||
wgsl += "}\n";
|
wgsl += "}\n";
|
||||||
|
|
||||||
|
// PREP — single thread; reads the live ray count and publishes the
|
||||||
|
// indirect dispatch args for the upcoming TRACE/SHADE, then zeroes the
|
||||||
|
// next buffer's emit counter so SHADE starts compacting from 0.
|
||||||
|
wgsl += "\n@compute @workgroup_size(1)\n";
|
||||||
|
wgsl += "fn wfPrep() { _wfPrep(); }\n";
|
||||||
|
|
||||||
|
// TRACE — zero user code: pure traversal + intersection. One thread
|
||||||
|
// per live ray; writes a HitResult into wfHits[i].
|
||||||
|
wgsl += "\n@compute @workgroup_size(64)\n";
|
||||||
|
wgsl += "fn wfTrace(@builtin(global_invocation_id) gid: vec3<u32>) { _wfTrace(gid.x); }\n";
|
||||||
|
|
||||||
|
// SHADE — one thread per live ray; loads the ray + its hit + payload,
|
||||||
|
// dispatches to runMiss / runClosestHit, which may rtAccumulate and
|
||||||
|
// rtEmitRay continuation/shadow rays into the next buffer.
|
||||||
|
wgsl += "\n@compute @workgroup_size(64)\n";
|
||||||
|
wgsl += "fn wfShade(@builtin(global_invocation_id) gid: vec3<u32>) { _wfShade(gid.x); }\n";
|
||||||
|
|
||||||
|
// RESOLVE — one thread per pixel; runs the user resolve (or identity)
|
||||||
|
// over the linear accumulator and stores to the output image.
|
||||||
|
wgsl += "\n@compute @workgroup_size(8, 8, 1)\n";
|
||||||
|
wgsl += "fn wfResolve(@builtin(global_invocation_id) gid: vec3<u32>) {\n";
|
||||||
|
wgsl += " if (gid.x >= wfParams.surfaceW || gid.y >= wfParams.surfaceH) { return; }\n";
|
||||||
|
wgsl += " let pixel = gid.y * wfParams.surfaceW + gid.x;\n";
|
||||||
|
wgsl += " let outc = runResolve(gid.xy, wfAccum[pixel]);\n";
|
||||||
|
wgsl += " textureStore(outImage, vec2<i32>(i32(gid.x), i32(gid.y)), outc);\n";
|
||||||
|
wgsl += "}\n";
|
||||||
|
|
||||||
pipelineHandle = WebGPU::wgpuLoadRTPipeline(
|
pipelineHandle = WebGPU::wgpuLoadRTPipeline(
|
||||||
wgsl.data(),
|
wgsl.data(),
|
||||||
static_cast<std::int32_t>(wgsl.size()),
|
static_cast<std::int32_t>(wgsl.size()),
|
||||||
|
|
|
||||||
|
|
@ -72,6 +72,11 @@ export namespace Crafter {
|
||||||
// 0 means "no user bindings".
|
// 0 means "no user bindings".
|
||||||
const void* handlesPtr = nullptr;
|
const void* handlesPtr = nullptr;
|
||||||
std::uint32_t handlesCount = 0;
|
std::uint32_t handlesCount = 0;
|
||||||
|
// Wavefront bounce budget: number of (TRACE; SHADE) iterations.
|
||||||
|
// 1 = primary rays only; 2 = primary + one continuation/shadow
|
||||||
|
// bounce; etc. The library unrolls GENERATE; (PREP; TRACE; SHADE)
|
||||||
|
// ×maxDepth; RESOLVE.
|
||||||
|
std::uint32_t maxDepth = 1;
|
||||||
|
|
||||||
RTPass(PipelineRTWebGPU* p) : pipeline(p) {}
|
RTPass(PipelineRTWebGPU* p) : pipeline(p) {}
|
||||||
|
|
||||||
|
|
@ -88,7 +93,8 @@ export namespace Crafter {
|
||||||
static_cast<std::int32_t>(gx),
|
static_cast<std::int32_t>(gx),
|
||||||
static_cast<std::int32_t>(gy),
|
static_cast<std::int32_t>(gy),
|
||||||
handlesPtr,
|
handlesPtr,
|
||||||
static_cast<std::int32_t>(handlesCount));
|
static_cast<std::int32_t>(handlesCount),
|
||||||
|
static_cast<std::int32_t>(maxDepth));
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -18,6 +18,11 @@ export namespace Crafter {
|
||||||
Miss = 1,
|
Miss = 1,
|
||||||
ClosestHit = 2,
|
ClosestHit = 2,
|
||||||
AnyHit = 3,
|
AnyHit = 3,
|
||||||
|
// Wavefront RESOLVE-stage tonemap/output hook. Optional: if no
|
||||||
|
// Resolve shader is registered, RESOLVE writes the linear accum
|
||||||
|
// buffer through unchanged. Signature:
|
||||||
|
// fn <entryFn>(coord: vec2<u32>, hdr: vec4<f32>) -> vec4<f32>
|
||||||
|
Resolve = 4,
|
||||||
};
|
};
|
||||||
|
|
||||||
// One WGSL shader source + the function name PipelineRTWebGPU should
|
// One WGSL shader source + the function name PipelineRTWebGPU should
|
||||||
|
|
|
||||||
|
|
@ -201,7 +201,8 @@ namespace Crafter::WebGPU {
|
||||||
std::uint32_t tlasBufHandle,
|
std::uint32_t tlasBufHandle,
|
||||||
std::int32_t instanceCount,
|
std::int32_t instanceCount,
|
||||||
std::int32_t gx, std::int32_t gy,
|
std::int32_t gx, std::int32_t gy,
|
||||||
const void* handlesPtr, std::int32_t handlesCount);
|
const void* handlesPtr, std::int32_t handlesCount,
|
||||||
|
std::int32_t maxDepth);
|
||||||
|
|
||||||
// GPU TLAS-build dispatch. Two sequential compute passes:
|
// GPU TLAS-build dispatch. Two sequential compute passes:
|
||||||
// 1. tlasBuildMain — per-instance world AABB + identity permutation
|
// 1. tlasBuildMain — per-instance world AABB + identity permutation
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue