Indexing a `layout(descriptor_heap)` array with a runtime (non-constant) index inside a ray-tracing hit shader device-losts on NVIDIA 610.43.02, for both SSBO and sampled-image descriptors. A constant/spec-constant index is fine, and the same dynamic pattern works in fragment shaders, so it's an RT-stage-specific driver fault — the same family as #7/#15 (descriptor-heap AS reads) and #21/#22 (RT recursion + compute TLAS push). Unlike the AS-read fault, this cannot be worked around transparently: a sampled image has no device-address escape hatch the way an acceleration structure does (OpConvertUToAccelerationStructureKHR), and a buffer-only buffer_reference rewrite would need a whole address-table architecture while still leaving the texture half broken. So the resolution is the documented-limitation path (the precedent set by #7). Records the fault and its isolation in README's Native RT status and in the Sponza example README (the textured-closest-hit example, which already reads its albedo through a spec-constant slot for exactly this reason). Documents the recommended consumer pattern: bind one resource and index *within* it dynamically (single geometry SSBO / buffer_reference at a spec-constant slot; one texture2DArray indexed by layer) rather than selecting a descriptor dynamically — what the WebGPU path already does. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
12 KiB
Crafter.Graphics
Vulkan + WebGPU graphics library built around C++20 modules and
bindless heaps. Provides window management, ray tracing, and a
compute-shader-driven UI on a single, opinionated stack. Native
builds use Vulkan with VK_EXT_descriptor_heap; wasm32-* builds
target the browser via WebGPU and a DOM window backend.
Backends
Backends are chosen at build time by the target triple:
| Target | Window | Renderer | Shaders |
|---|---|---|---|
| native Linux | Wayland | Vulkan (heap-bound) | GLSL → SPIR-V |
| native Windows | Win32 | Vulkan (heap-bound) | GLSL → SPIR-V |
wasm32-* (any) |
DOM (canvas + JS env) | WebGPU | WGSL (loaded at runtime) |
The two backends share the same C++ surface for the high-level pieces
(UIRenderer, Mesh, RenderingElement3D, RTPass, item structs,
FontAtlas, Image2D, ComputeShader). Backend-typed pieces
(*Vulkan vs *WebGPU) live behind #ifdef CRAFTER_GRAPHICS_WINDOW_DOM.
Vulkan ray tracing is hardware (VK_KHR_ray_tracing_pipeline); WebGPU
ray tracing is a library-built software path (BVH + traceRay in a
compute pipeline composed from user-supplied WGSL stages). The WebGPU
path supports triangle and AABB (procedural, VK_GEOMETRY_TYPE_AABBS_KHR)
geometry, closest-hit / miss / any-hit / intersection shaders — see
examples/RTVolume for procedural spheres
shaded through an intersection shader with an any-hit cut-out.
Native RT status: reading an acceleration structure through
VK_EXT_descriptor_heapaborts withVK_ERROR_DEVICE_LOSTon NVIDIA driver610.43.02— a driver-side fault in the brand-new descriptor-heap acceleration-structure path, not an engine bug (the setup is correct and validation-clean; images/buffers through the same heap work). The engine works around it transparently (issue #15): on the NVIDIA driver only,VulkanShaderrewrites the compiled SPIR-V so heap AS reads become a TLAS-device-address +OpConvertUToAccelerationStructureKHRpath (which reads no descriptor), andRTPasssupplies the address as push data. Shaders and example code are unchanged, and it's a single fenced block gated onDevice::workaroundDescriptorHeapAS, removable once a fixed driver ships. See examples/VulkanTriangle/README.md for the full investigation. WebGPU RT is unaffected.
Native RT limitation — dynamic
descriptor_heapindexing in hit shaders: on the same NVIDIA driver, indexing adescriptor_heaparray with a runtime (non-constant) index inside a ray-tracing hit shader also device-losts (VK_ERROR_DEVICE_LOST), for plain SSBO and sampled-image descriptors. A constant / spec-constant index is fine (that's why Sponza's closest-hit readsalbedo[albedoSlot]through a spec constant), and the identical dynamic pattern works in fragment shaders (the UI renderer indexesuiTextures[]by per-item runtime slots) — so this is RT-stage-specific, not a general heap problem. Unlike the AS-read fault above this cannot be worked around transparently: sampled images have no device-address escape hatch the way an acceleration structure does (OpConvertUToAccelerationStructureKHR). The recommended pattern for bindless per-mesh geometry/material is to bind one resource and index within it dynamically rather than selecting a descriptor dynamically: pack geometry into a single SSBO (or reach it viabuffer_reference) at a spec-constant slot and index by element offset, and put materials in onetexture2DArrayindexed by layer. Dynamic addressing inside a bound resource is ordinary memory/layer addressing and is unaffected; only dynamic selection of a descriptor faults. This is exactly what the WebGPU path already does (bucketed texture arrays + a single buffer). Full investigation and GLSL in examples/Sponza/README.md (issue #23). WebGPU RT is unaffected.
What's in here
- Window — Wayland, Win32, and DOM backends, swapchain ring / canvas framing, input events. Pick a backend at build time via the target triple. The DOM backend routes every dynamic symbol through additional/dom-env.js and additional/dom-webgpu.js.
- Device (Vulkan only) — single-instance bring-up targeting
VK_EXT_descriptor_heap; pipelines are created withVK_PIPELINE_CREATE_2_DESCRIPTOR_HEAP_BIT_EXTso there are no descriptor-set layouts and push constants travel viavkCmdPushDataEXT. - DescriptorHeapVulkan / DescriptorHeapWebGPU — bindless slot
allocators. Vulkan side allocates image/buffer/sampler slots in a
VK_EXT_descriptor_heap; WebGPU side resolves slots to JS-side handle-table cookies that the dispatch bridge binds per pass. - VulkanBuffer<T, Mapped> / WebGPUBuffer<T> — typed buffer.
Vulkan variant has optional host mapping and a
FlushDevicethat issues the right host-write barrier; WebGPU variant goes throughqueue.writeBufferover the JS bridge. - ImageVulkan<Pixel> / Image2D<Pixel> / Image2DArray<Pixel> —
image + staging buffer with mip-chain support on Vulkan; on WebGPU,
rgba8unorm2D / 2D-array textures created and written via the bridge. Atlas (r8unorm, sub-region writes) is a separate path. - PipelineRTVulkan / PipelineRTWebGPU / ShaderBindingTableVulkan /
ShaderBindingTableWebGPU / RTPass — ray-tracing pipelines. Vulkan
uses native RT pipelines + SBTs; WebGPU compiles a wavefront /
streaming software tracer — five
@computekernels (GENERATE → PREP → TRACE → SHADE → RESOLVE) sharing one module, connected by GPU ray/hit/payload buffers and a GPU-driven indirect bounce loop (dispatchWorkgroupsIndirect). TRACE carries zero user code (traversal + intersection only); user raygen callsrtEmitPrimaryRay, and closesthit / miss run in SHADE where theyrtEmitRaycontinuation/shadow rays andrtAccumulateradiance. An optional Resolve shader tonemaps the linear accumulator. See WAVEFRONT-DESIGN.md. - ComputeShader / WebGPUComputeShader — Tier 1 wrapper used by the
UI system. Vulkan loads a
.spvand dispatches withvkCmdPushDataEXT; WebGPU loads a user-supplied.wgslblob at runtime viawgpuLoadCustomShader. Use it directly for any custom compute. - UI — three-tier UI system; see below. The standard shaders ship
as four
.spvblobs on native and four WGSL strings baked into the WebGPU dispatcher. - FontAtlas — single-channel SDF atlas (1024×1024, 32pt base,
shelf-packed, lazy
Ensureper codepoint, dirty-flush viaUpdate). Backend-agnostic. - Mesh / RenderingElement3D / Animation — BLAS/TLAS construction
and 3D scene plumbing. Vulkan calls
vkCmdBuildAccelerationStructures; WebGPU registers BLAS data (verts, idx, BVH nodes, primRemap, optional per-vertex attribs) into global mesh heaps and builds the TLAS in a library compute pass. - Clipboard / Input / Gamepad / Router / Dom — input plumbing.
Gamepad uses libudev+libevdev on Linux and WGI on Windows; the DOM
backend exposes the host page DOM (
Dom::HtmlElement) and a router for hash-routed wasm apps.
UI system (three tiers)
The UI is deliberately layered to balance no-boilerplate against no-lock-in:
- Tier 1 —
ComputeShader. Load any.spv, dispatch with push constants, library inserts inter-dispatch barriers. The escape hatch: if the standard shaders don't fit, write your own compute and dispatch it next to them. - Tier 2 —
UIRenderer+ standard shaders. Four shipped compute shaders (drawQuads,drawCircles,drawImages,drawText), POD item structs (QuadItem,CircleItem,ImageItem,GlyphItem), a shared GLSL contract in shaders/ui-shared.glsl, and helpers (RegisterBuffer,RegisterImage,RegisterSampler,FillHeader,Dispatch*,ShapeText). You build your own per-shader SSBOs (manual batching) and call oneDispatch*per shader type per frame. Item array order = draw order. - Tier 3 — stateless presentation functions.
DrawButton,DrawCheckbox,DrawSlider,DrawProgressBar. Each is a small function that appends items to your buffers — they don't dispatch. Colors come in as small inline*Colorsaggregates, no libraryThemetype. The source is the customization API: if a component doesn't fit, copy its body and edit it. No virtual hooks, no extension points.
What's not in the UI: widget tree, layout engine (just a Rect::SubRect
carving helper), theming, hit-testing, focus management. State for
interactive components (hover, drag, focus) lives in user-owned POD
structs, not the library.
UI dispatch model
Standard shaders dispatch one workgroup per 8×8 screen tile — each
thread iterates every item in the SSBO in array order, accumulating
into a local dst, and stores once. Total cost is O(W·H·N); works
well up to a few hundred items at 1080p. Splitting one buffer into
multiple dispatches doesn't help — the same total work plus barrier
overhead. If you need to render thousands of UI items, you want a
different shader (tile binning, per-item-list resolve), not more
dispatches.
Build
The repository is built with crafter-build (a project-config based
build system; the project description lives in project.cpp):
crafter-build # native: Wayland on Linux, Win32 on Windows
crafter-build --target=wasm32-wasip1 # browser: DOM window + WebGPU renderer
crafter-build -r # build and run (in an example directory)
The build picks the window + renderer pair automatically from the
target triple: any wasm32-* triple flips to DOM + WebGPU (no Vulkan
loader linked), everything else stays on the native Vulkan path. Each
example with both backends ships GLSL and WGSL copies of its shaders
side-by-side (e.g. raygen.glsl +
raygen.wgsl); project.cpp selects the
right set per target.
Examples
See examples/. Quick map:
- HelloWindow — minimal native window, no rendering.
- HelloDom — wasm-only smoke test of the DOM
partition: page-level events,
HtmlElement::CreateInBody, andRouter::PushState-driven SPA navigation. No GPU work. - VulkanTriangle — ray-traced triangle on both Vulkan and WebGPU. The smallest test of the bindless + RT path on each backend.
- RTStress — wavefront RT benchmark: an N×N×N grid
of a cube mesh (instance-count knob
kGrid, 512 → 8000) shaded with primary + shadow rays. Prints a GPU timestamp-query per-pass breakdown each second. WebGPU/DOM only. - Sponza — ray-traced Sponza atrium on both
backends. Exercises
.cmesh/.ctexdecompression (GPUVK_EXT_memory_decompressionon Vulkan, CPU on WebGPU) and a textured closest-hit. See its README for asset provenance. - HelloUI — UI smoke test using all three tiers (background quad, slider, progress bar, button with text label, cursor-tracking circle).
- CustomShader — Tier 1 demo: a user-authored
compute shader inverting RGB under a list of item-circles, dispatched
alongside the standard
drawQuads. Shipped as both.comp.glsland.comp.wgsl. - Decompression —
Crafter::CompressionCPU round-trip smoke test (used by the WebGPU asset path). - InputSystem — keyboard / mouse / gamepad event surface check.
License
LGPL 3.0. See per-file headers and LICENSE.