Crafter.Graphics/implementations/Crafter.Graphics-ComputeShader.cpp

117 lines
4.7 KiB
C++
Raw Permalink Normal View History

2026-05-02 21:08:20 +02:00
/*
Crafter®.Graphics
Copyright (C) 2026 Catcrafts®
catcrafts.net
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License version 3.0 as published by the Free Software Foundation;
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
module;
#include "vulkan/vulkan.h"
module Crafter.Graphics:ComputeShader_impl;
import :ComputeShader;
import :ShaderVulkan;
import :Device;
import std;
using namespace Crafter;
fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21) Two gaps in the Vulkan RT path that fault the device on the NVIDIA proprietary driver with a non-trivial pipeline (simple VulkanTriangle never hit them): 1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit shader that traces a secondary ray (shadow ray — a very common pattern) recursed past the pipeline limit (UB → device fault). PipelineRTVulkan::Init now takes a maxRecursionDepth parameter (default 1, clamped to the device's maxRayRecursionDepth). 2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader that reads an accelerationStructureEXT from the heap — including compute shaders — to read the TLAS device address from a push constant, but only RTPass pushed that address. A compute shader that ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten push slot → garbage AS handle → VK_ERROR_DEVICE_LOST. WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult {patched, tlasPushOffset} instead of writing the clobber-prone global Device::workaroundTlasPushOffset (removed). VulkanShader stores it; ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and ComputeShader tracks its own offset and pushes the caller-supplied TLAS address in Dispatch (new defaulted tlasAddress parameter), mirroring RTPass::Record. The PushConstantRewrite regression test now asserts Patch's returned patched/offset and adds two ray-querying compute-shader cases, proving the rewrite is stage-agnostic and the per-shader offset is correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 18:35:39 +00:00
ComputeShader::ComputeShader(ComputeShader&& other) noexcept
: pipeline(other.pipeline),
workaroundNeedsTlas(other.workaroundNeedsTlas),
workaroundTlasPushOffset(other.workaroundTlasPushOffset) {
2026-05-02 21:08:20 +02:00
other.pipeline = VK_NULL_HANDLE;
}
ComputeShader& ComputeShader::operator=(ComputeShader&& other) noexcept {
if (this != &other) {
if (pipeline != VK_NULL_HANDLE) {
vkDestroyPipeline(Device::device, pipeline, nullptr);
}
pipeline = other.pipeline;
fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21) Two gaps in the Vulkan RT path that fault the device on the NVIDIA proprietary driver with a non-trivial pipeline (simple VulkanTriangle never hit them): 1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit shader that traces a secondary ray (shadow ray — a very common pattern) recursed past the pipeline limit (UB → device fault). PipelineRTVulkan::Init now takes a maxRecursionDepth parameter (default 1, clamped to the device's maxRayRecursionDepth). 2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader that reads an accelerationStructureEXT from the heap — including compute shaders — to read the TLAS device address from a push constant, but only RTPass pushed that address. A compute shader that ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten push slot → garbage AS handle → VK_ERROR_DEVICE_LOST. WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult {patched, tlasPushOffset} instead of writing the clobber-prone global Device::workaroundTlasPushOffset (removed). VulkanShader stores it; ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and ComputeShader tracks its own offset and pushes the caller-supplied TLAS address in Dispatch (new defaulted tlasAddress parameter), mirroring RTPass::Record. The PushConstantRewrite regression test now asserts Patch's returned patched/offset and adds two ray-querying compute-shader cases, proving the rewrite is stage-agnostic and the per-shader offset is correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 18:35:39 +00:00
workaroundNeedsTlas = other.workaroundNeedsTlas;
workaroundTlasPushOffset = other.workaroundTlasPushOffset;
2026-05-02 21:08:20 +02:00
other.pipeline = VK_NULL_HANDLE;
}
return *this;
}
ComputeShader::~ComputeShader() {
if (pipeline != VK_NULL_HANDLE) {
vkDestroyPipeline(Device::device, pipeline, nullptr);
pipeline = VK_NULL_HANDLE;
}
}
void ComputeShader::Load(const std::filesystem::path& spvPath) {
VulkanShader shader(spvPath, "main", VK_SHADER_STAGE_COMPUTE_BIT, nullptr);
fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21) Two gaps in the Vulkan RT path that fault the device on the NVIDIA proprietary driver with a non-trivial pipeline (simple VulkanTriangle never hit them): 1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit shader that traces a secondary ray (shadow ray — a very common pattern) recursed past the pipeline limit (UB → device fault). PipelineRTVulkan::Init now takes a maxRecursionDepth parameter (default 1, clamped to the device's maxRayRecursionDepth). 2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader that reads an accelerationStructureEXT from the heap — including compute shaders — to read the TLAS device address from a push constant, but only RTPass pushed that address. A compute shader that ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten push slot → garbage AS handle → VK_ERROR_DEVICE_LOST. WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult {patched, tlasPushOffset} instead of writing the clobber-prone global Device::workaroundTlasPushOffset (removed). VulkanShader stores it; ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and ComputeShader tracks its own offset and pushes the caller-supplied TLAS address in Dispatch (new defaulted tlasAddress parameter), mirroring RTPass::Record. The PushConstantRewrite regression test now asserts Patch's returned patched/offset and adds two ray-querying compute-shader cases, proving the rewrite is stage-agnostic and the per-shader offset is correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 18:35:39 +00:00
// NVIDIA descriptor-heap AS-read workaround (issue #15 / #7): remember
// whether VulkanShader rewrote a heap acceleration-structure read in this
// module, and where it expects the TLAS address pushed, so Dispatch can
// feed it the per-frame TLAS. Per-shader, not a global — see ComputeShader.
workaroundNeedsTlas = shader.patchedAS;
workaroundTlasPushOffset = shader.tlasPushOffset;
2026-05-02 21:08:20 +02:00
// Spec: with VK_PIPELINE_CREATE_2_DESCRIPTOR_HEAP_BIT_EXT, layout MUST be
// VK_NULL_HANDLE — bindings come from the bound descriptor heap and push
// constants are pushed via vkCmdPushDataEXT instead of vkCmdPushConstants.
VkPipelineCreateFlags2CreateInfo flags2 {
.sType = VK_STRUCTURE_TYPE_PIPELINE_CREATE_FLAGS_2_CREATE_INFO,
.flags = VK_PIPELINE_CREATE_2_DESCRIPTOR_HEAP_BIT_EXT,
};
VkComputePipelineCreateInfo info {
.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO,
.pNext = &flags2,
.stage = {
.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO,
.stage = VK_SHADER_STAGE_COMPUTE_BIT,
.module = shader.shader,
.pName = "main",
},
.layout = VK_NULL_HANDLE,
};
Device::CheckVkResult(vkCreateComputePipelines(
Device::device, VK_NULL_HANDLE, 1, &info, nullptr, &pipeline));
}
void ComputeShader::Dispatch(VkCommandBuffer cmd,
const void* push, std::uint32_t pushBytes,
std::uint32_t gx,
std::uint32_t gy,
fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21) Two gaps in the Vulkan RT path that fault the device on the NVIDIA proprietary driver with a non-trivial pipeline (simple VulkanTriangle never hit them): 1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit shader that traces a secondary ray (shadow ray — a very common pattern) recursed past the pipeline limit (UB → device fault). PipelineRTVulkan::Init now takes a maxRecursionDepth parameter (default 1, clamped to the device's maxRayRecursionDepth). 2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader that reads an accelerationStructureEXT from the heap — including compute shaders — to read the TLAS device address from a push constant, but only RTPass pushed that address. A compute shader that ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten push slot → garbage AS handle → VK_ERROR_DEVICE_LOST. WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult {patched, tlasPushOffset} instead of writing the clobber-prone global Device::workaroundTlasPushOffset (removed). VulkanShader stores it; ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and ComputeShader tracks its own offset and pushes the caller-supplied TLAS address in Dispatch (new defaulted tlasAddress parameter), mirroring RTPass::Record. The PushConstantRewrite regression test now asserts Patch's returned patched/offset and adds two ray-querying compute-shader cases, proving the rewrite is stage-agnostic and the per-shader offset is correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 18:35:39 +00:00
std::uint32_t gz,
VkDeviceAddress tlasAddress) const {
2026-05-02 21:08:20 +02:00
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
if (push != nullptr && pushBytes > 0) {
VkPushDataInfoEXT pushInfo {
.sType = VK_STRUCTURE_TYPE_PUSH_DATA_INFO_EXT,
.offset = 0,
.data = { .address = const_cast<void*>(push), .size = pushBytes },
};
Device::vkCmdPushDataEXT(cmd, &pushInfo);
}
fix(vulkan-rt): configurable recursion depth + per-shader TLAS push for compute (#21) Two gaps in the Vulkan RT path that fault the device on the NVIDIA proprietary driver with a non-trivial pipeline (simple VulkanTriangle never hit them): 1. maxPipelineRayRecursionDepth was hardcoded to 1, so any closest-hit shader that traces a secondary ray (shadow ray — a very common pattern) recursed past the pipeline limit (UB → device fault). PipelineRTVulkan::Init now takes a maxRecursionDepth parameter (default 1, clamped to the device's maxRayRecursionDepth). 2. The NVIDIA descriptor-heap AS-read workaround rewrites every shader that reads an accelerationStructureEXT from the heap — including compute shaders — to read the TLAS device address from a push constant, but only RTPass pushed that address. A compute shader that ray-queries the TLAS (rayQueryEXT) therefore ran against an unwritten push slot → garbage AS handle → VK_ERROR_DEVICE_LOST. WorkaroundNvidiaAS::Patch now returns a per-shader PatchResult {patched, tlasPushOffset} instead of writing the clobber-prone global Device::workaroundTlasPushOffset (removed). VulkanShader stores it; ShaderBindingTableVulkan/PipelineRTVulkan carry it for RTPass, and ComputeShader tracks its own offset and pushes the caller-supplied TLAS address in Dispatch (new defaulted tlasAddress parameter), mirroring RTPass::Record. The PushConstantRewrite regression test now asserts Patch's returned patched/offset and adds two ray-querying compute-shader cases, proving the rewrite is stage-agnostic and the per-shader offset is correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 18:35:39 +00:00
// NVIDIA descriptor-heap AS-read workaround (issue #15 / #7): if this shader
// ray-queries the TLAS through the heap it was rewritten to read the TLAS
// device address from a push constant; push the caller-supplied address
// where the rewrite reads it (after any user payload, or offset 0 if none).
// Mirrors RTPass::Record for the RT pipeline. Inert on every other driver.
if (Device::workaroundDescriptorHeapAS && workaroundNeedsTlas) {
VkPushDataInfoEXT tlasPush {
.sType = VK_STRUCTURE_TYPE_PUSH_DATA_INFO_EXT,
.offset = workaroundTlasPushOffset,
.data = { .address = &tlasAddress, .size = sizeof(tlasAddress) },
};
Device::vkCmdPushDataEXT(cmd, &tlasPush);
}
2026-05-02 21:08:20 +02:00
vkCmdDispatch(cmd, gx, gy, gz);
}