Indexing a `layout(descriptor_heap)` array with a runtime (non-constant) index inside a ray-tracing hit shader device-losts on NVIDIA 610.43.02, for both SSBO and sampled-image descriptors. A constant/spec-constant index is fine, and the same dynamic pattern works in fragment shaders, so it's an RT-stage-specific driver fault — the same family as #7/#15 (descriptor-heap AS reads) and #21/#22 (RT recursion + compute TLAS push). Unlike the AS-read fault, this cannot be worked around transparently: a sampled image has no device-address escape hatch the way an acceleration structure does (OpConvertUToAccelerationStructureKHR), and a buffer-only buffer_reference rewrite would need a whole address-table architecture while still leaving the texture half broken. So the resolution is the documented-limitation path (the precedent set by #7). Records the fault and its isolation in README's Native RT status and in the Sponza example README (the textured-closest-hit example, which already reads its albedo through a spec-constant slot for exactly this reason). Documents the recommended consumer pattern: bind one resource and index *within* it dynamically (single geometry SSBO / buffer_reference at a spec-constant slot; one texture2DArray indexed by layer) rather than selecting a descriptor dynamically — what the WebGPU path already does. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
223 lines
12 KiB
Markdown
223 lines
12 KiB
Markdown
# Crafter.Graphics
|
||
|
||
Vulkan + WebGPU graphics library built around C++20 modules and
|
||
bindless heaps. Provides window management, ray tracing, and a
|
||
compute-shader-driven UI on a single, opinionated stack. Native
|
||
builds use Vulkan with `VK_EXT_descriptor_heap`; `wasm32-*` builds
|
||
target the browser via WebGPU and a DOM window backend.
|
||
|
||
## Backends
|
||
|
||
Backends are chosen at build time by the target triple:
|
||
|
||
| Target | Window | Renderer | Shaders |
|
||
|---------------------|------------------------|---------------------|---------|
|
||
| native Linux | Wayland | Vulkan (heap-bound) | GLSL → SPIR-V |
|
||
| native Windows | Win32 | Vulkan (heap-bound) | GLSL → SPIR-V |
|
||
| `wasm32-*` (any) | DOM (canvas + JS env) | WebGPU | WGSL (loaded at runtime) |
|
||
|
||
The two backends share the same C++ surface for the high-level pieces
|
||
(`UIRenderer`, `Mesh`, `RenderingElement3D`, `RTPass`, item structs,
|
||
`FontAtlas`, `Image2D`, `ComputeShader`). Backend-typed pieces
|
||
(`*Vulkan` vs `*WebGPU`) live behind `#ifdef CRAFTER_GRAPHICS_WINDOW_DOM`.
|
||
Vulkan ray tracing is hardware (`VK_KHR_ray_tracing_pipeline`); WebGPU
|
||
ray tracing is a library-built software path (BVH + traceRay in a
|
||
compute pipeline composed from user-supplied WGSL stages). The WebGPU
|
||
path supports triangle and AABB (procedural, `VK_GEOMETRY_TYPE_AABBS_KHR`)
|
||
geometry, closest-hit / miss / any-hit / intersection shaders — see
|
||
[examples/RTVolume](examples/RTVolume/README.md) for procedural spheres
|
||
shaded through an intersection shader with an any-hit cut-out.
|
||
|
||
> **Native RT status:** reading an acceleration structure through
|
||
> `VK_EXT_descriptor_heap` aborts with `VK_ERROR_DEVICE_LOST` on NVIDIA
|
||
> driver `610.43.02` — a driver-side fault in the brand-new descriptor-heap
|
||
> acceleration-structure path, not an engine bug (the setup is correct and
|
||
> validation-clean; images/buffers through the same heap work). The engine
|
||
> **works around it transparently** (issue #15): on the NVIDIA driver only,
|
||
> `VulkanShader` rewrites the compiled SPIR-V so heap AS reads become a
|
||
> TLAS-device-address + `OpConvertUToAccelerationStructureKHR` path (which
|
||
> reads no descriptor), and `RTPass` supplies the address as push data.
|
||
> Shaders and example code are unchanged, and it's a single fenced block
|
||
> gated on `Device::workaroundDescriptorHeapAS`, removable once a fixed
|
||
> driver ships. See
|
||
> [examples/VulkanTriangle/README.md](examples/VulkanTriangle/README.md)
|
||
> for the full investigation. WebGPU RT is unaffected.
|
||
|
||
> **Native RT limitation — dynamic `descriptor_heap` indexing in hit
|
||
> shaders:** on the same NVIDIA driver, indexing a `descriptor_heap`
|
||
> array with a **runtime (non-constant)** index inside a ray-tracing
|
||
> **hit** shader also device-losts (`VK_ERROR_DEVICE_LOST`), for plain
|
||
> SSBO **and** sampled-image descriptors. A **constant / spec-constant**
|
||
> index is fine (that's why [Sponza](examples/Sponza/README.md)'s
|
||
> closest-hit reads `albedo[albedoSlot]` through a spec constant), and
|
||
> the identical dynamic pattern works in fragment shaders (the UI
|
||
> renderer indexes `uiTextures[]` by per-item runtime slots) — so this
|
||
> is **RT-stage-specific**, not a general heap problem. Unlike the
|
||
> AS-read fault above this **cannot** be worked around transparently:
|
||
> sampled images have no device-address escape hatch the way an
|
||
> acceleration structure does (`OpConvertUToAccelerationStructureKHR`).
|
||
> The recommended pattern for bindless per-mesh geometry/material is to
|
||
> **bind one resource and index *within* it dynamically** rather than
|
||
> selecting a descriptor dynamically: pack geometry into a single SSBO
|
||
> (or reach it via `buffer_reference`) at a spec-constant slot and index
|
||
> by element offset, and put materials in one `texture2DArray` indexed
|
||
> by layer. Dynamic addressing *inside* a bound resource is ordinary
|
||
> memory/layer addressing and is unaffected; only dynamic selection of a
|
||
> *descriptor* faults. This is exactly what the WebGPU path already does
|
||
> (bucketed texture arrays + a single buffer). Full investigation and
|
||
> GLSL in [examples/Sponza/README.md](examples/Sponza/README.md) (issue
|
||
> #23). WebGPU RT is unaffected.
|
||
|
||
## What's in here
|
||
|
||
- **Window** — Wayland, Win32, and DOM backends, swapchain ring / canvas
|
||
framing, input events. Pick a backend at build time via the target
|
||
triple. The DOM backend routes every dynamic symbol through
|
||
[additional/dom-env.js](additional/dom-env.js) and
|
||
[additional/dom-webgpu.js](additional/dom-webgpu.js).
|
||
- **Device** *(Vulkan only)* — single-instance bring-up targeting
|
||
`VK_EXT_descriptor_heap`; pipelines are created with
|
||
`VK_PIPELINE_CREATE_2_DESCRIPTOR_HEAP_BIT_EXT` so there are no
|
||
descriptor-set layouts and push constants travel via
|
||
`vkCmdPushDataEXT`.
|
||
- **DescriptorHeapVulkan / DescriptorHeapWebGPU** — bindless slot
|
||
allocators. Vulkan side allocates image/buffer/sampler slots in a
|
||
`VK_EXT_descriptor_heap`; WebGPU side resolves slots to JS-side
|
||
handle-table cookies that the dispatch bridge binds per pass.
|
||
- **VulkanBuffer\<T, Mapped\> / WebGPUBuffer\<T\>** — typed buffer.
|
||
Vulkan variant has optional host mapping and a `FlushDevice` that
|
||
issues the right host-write barrier; WebGPU variant goes through
|
||
`queue.writeBuffer` over the JS bridge.
|
||
- **ImageVulkan\<Pixel\> / Image2D\<Pixel\> / Image2DArray\<Pixel\>** —
|
||
image + staging buffer with mip-chain support on Vulkan; on WebGPU,
|
||
`rgba8unorm` 2D / 2D-array textures created and written via the
|
||
bridge. Atlas (`r8unorm`, sub-region writes) is a separate path.
|
||
- **PipelineRTVulkan / PipelineRTWebGPU / ShaderBindingTableVulkan /
|
||
ShaderBindingTableWebGPU / RTPass** — ray-tracing pipelines. Vulkan
|
||
uses native RT pipelines + SBTs; WebGPU compiles a **wavefront /
|
||
streaming** software tracer — five `@compute` kernels
|
||
(`GENERATE → PREP → TRACE → SHADE → RESOLVE`) sharing one module,
|
||
connected by GPU ray/hit/payload buffers and a GPU-driven indirect
|
||
bounce loop (`dispatchWorkgroupsIndirect`). TRACE carries zero user
|
||
code (traversal + intersection only); user raygen calls
|
||
`rtEmitPrimaryRay`, and closesthit / miss run in SHADE where they
|
||
`rtEmitRay` continuation/shadow rays and `rtAccumulate` radiance. An
|
||
optional Resolve shader tonemaps the linear accumulator. See
|
||
[WAVEFRONT-DESIGN.md](WAVEFRONT-DESIGN.md).
|
||
- **ComputeShader / WebGPUComputeShader** — Tier 1 wrapper used by the
|
||
UI system. Vulkan loads a `.spv` and dispatches with
|
||
`vkCmdPushDataEXT`; WebGPU loads a user-supplied `.wgsl` blob at
|
||
runtime via `wgpuLoadCustomShader`. Use it directly for any custom
|
||
compute.
|
||
- **UI** — three-tier UI system; see below. The standard shaders ship
|
||
as four `.spv` blobs on native and four WGSL strings baked into the
|
||
WebGPU dispatcher.
|
||
- **FontAtlas** — single-channel SDF atlas (1024×1024, 32pt base,
|
||
shelf-packed, lazy `Ensure` per codepoint, dirty-flush via `Update`).
|
||
Backend-agnostic.
|
||
- **Mesh / RenderingElement3D / Animation** — BLAS/TLAS construction
|
||
and 3D scene plumbing. Vulkan calls `vkCmdBuildAccelerationStructures`;
|
||
WebGPU registers BLAS data (verts, idx, BVH nodes, primRemap, optional
|
||
per-vertex attribs) into global mesh heaps and builds the TLAS in a
|
||
library compute pass.
|
||
- **Clipboard / Input / Gamepad / Router / Dom** — input plumbing.
|
||
Gamepad uses libudev+libevdev on Linux and WGI on Windows; the DOM
|
||
backend exposes the host page DOM (`Dom::HtmlElement`) and a router
|
||
for hash-routed wasm apps.
|
||
|
||
## UI system (three tiers)
|
||
|
||
The UI is *deliberately* layered to balance no-boilerplate against
|
||
no-lock-in:
|
||
|
||
- **Tier 1 — `ComputeShader`.** Load any `.spv`, dispatch with push
|
||
constants, library inserts inter-dispatch barriers. The escape hatch:
|
||
if the standard shaders don't fit, write your own compute and
|
||
dispatch it next to them.
|
||
- **Tier 2 — `UIRenderer` + standard shaders.** Four shipped compute
|
||
shaders (`drawQuads`, `drawCircles`, `drawImages`, `drawText`), POD
|
||
item structs (`QuadItem`, `CircleItem`, `ImageItem`, `GlyphItem`), a
|
||
shared GLSL contract in [shaders/ui-shared.glsl](shaders/ui-shared.glsl),
|
||
and helpers (`RegisterBuffer`, `RegisterImage`, `RegisterSampler`,
|
||
`FillHeader`, `Dispatch*`, `ShapeText`). You build your own per-shader
|
||
SSBOs (manual batching) and call one `Dispatch*` per shader type per
|
||
frame. Item array order = draw order.
|
||
- **Tier 3 — stateless presentation functions.** `DrawButton`,
|
||
`DrawCheckbox`, `DrawSlider`, `DrawProgressBar`. Each is a small
|
||
function that *appends* items to your buffers — they don't dispatch.
|
||
Colors come in as small inline `*Colors` aggregates, no library
|
||
`Theme` type. **The source is the customization API**: if a
|
||
component doesn't fit, copy its body and edit it. No virtual hooks,
|
||
no extension points.
|
||
|
||
What's *not* in the UI: widget tree, layout engine (just a `Rect::SubRect`
|
||
carving helper), theming, hit-testing, focus management. State for
|
||
interactive components (hover, drag, focus) lives in user-owned POD
|
||
structs, not the library.
|
||
|
||
### UI dispatch model
|
||
|
||
Standard shaders dispatch one workgroup per 8×8 *screen tile* — each
|
||
thread iterates every item in the SSBO in array order, accumulating
|
||
into a local `dst`, and stores once. Total cost is `O(W·H·N)`; works
|
||
well up to a few hundred items at 1080p. Splitting one buffer into
|
||
multiple dispatches doesn't help — the same total work plus barrier
|
||
overhead. If you need to render thousands of UI items, you want a
|
||
different shader (tile binning, per-item-list resolve), not more
|
||
dispatches.
|
||
|
||
## Build
|
||
|
||
The repository is built with `crafter-build` (a project-config based
|
||
build system; the project description lives in `project.cpp`):
|
||
|
||
```bash
|
||
crafter-build # native: Wayland on Linux, Win32 on Windows
|
||
crafter-build --target=wasm32-wasip1 # browser: DOM window + WebGPU renderer
|
||
crafter-build -r # build and run (in an example directory)
|
||
```
|
||
|
||
The build picks the window + renderer pair automatically from the
|
||
target triple: any `wasm32-*` triple flips to DOM + WebGPU (no Vulkan
|
||
loader linked), everything else stays on the native Vulkan path. Each
|
||
example with both backends ships GLSL *and* WGSL copies of its shaders
|
||
side-by-side (e.g. [raygen.glsl](examples/Sponza/raygen.glsl) +
|
||
[raygen.wgsl](examples/Sponza/raygen.wgsl)); `project.cpp` selects the
|
||
right set per target.
|
||
|
||
## Examples
|
||
|
||
See [examples/](examples/). Quick map:
|
||
|
||
- [HelloWindow](examples/HelloWindow/) — minimal native window, no rendering.
|
||
- [HelloDom](examples/HelloDom/) — wasm-only smoke test of the DOM
|
||
partition: page-level events, `HtmlElement::CreateInBody`, and
|
||
`Router::PushState`-driven SPA navigation. No GPU work.
|
||
- [VulkanTriangle](examples/VulkanTriangle/) — ray-traced triangle on
|
||
both Vulkan and WebGPU. The smallest test of the bindless + RT path
|
||
on each backend.
|
||
- [RTStress](examples/RTStress/) — wavefront RT benchmark: an N×N×N grid
|
||
of a cube mesh (instance-count knob `kGrid`, 512 → 8000) shaded with
|
||
primary + shadow rays. Prints a GPU timestamp-query per-pass breakdown
|
||
each second. WebGPU/DOM only.
|
||
- [Sponza](examples/Sponza/) — ray-traced Sponza atrium on both
|
||
backends. Exercises `.cmesh` / `.ctex` decompression (GPU
|
||
`VK_EXT_memory_decompression` on Vulkan, CPU on WebGPU) and a
|
||
textured closest-hit. See [its README](examples/Sponza/README.md)
|
||
for asset provenance.
|
||
- [HelloUI](examples/HelloUI/) — UI smoke test using all three tiers
|
||
(background quad, slider, progress bar, button with text label,
|
||
cursor-tracking circle).
|
||
- [CustomShader](examples/CustomShader/) — Tier 1 demo: a user-authored
|
||
compute shader inverting RGB under a list of item-circles, dispatched
|
||
alongside the standard `drawQuads`. Shipped as both
|
||
[`.comp.glsl`](examples/CustomShader/inverse-circle.comp.glsl) and
|
||
[`.comp.wgsl`](examples/CustomShader/inverse-circle.comp.wgsl).
|
||
- [Decompression](examples/Decompression/) — `Crafter::Compression`
|
||
CPU round-trip smoke test (used by the WebGPU asset path).
|
||
- [InputSystem](examples/InputSystem/) — keyboard / mouse / gamepad
|
||
event surface check.
|
||
|
||
## License
|
||
|
||
LGPL 3.0. See per-file headers and `LICENSE`.
|