fix(vulkan-rt): merge TLAS push constant into existing block (#18)

The NVIDIA descriptor-heap AS-read workaround (#15) rewrote heap
acceleration-structure reads into a load of the TLAS device address from
a push-constant block. It always *synthesized a new* push-constant block,
so any ray-tracing shader that already declared one ended up with two —
which SPIR-V forbids ("at most one push constant block statically used per
entry point"), and vkCreateShaderModule's spirv-val check rejected:

    Entry point id '4' uses more than one PushConstant interface.

WorkaroundNvidiaAS::Patch now detects an existing PushConstant variable and,
when present, appends a single ulong member (the TLAS address) to that
block instead of adding a second one, reading the address through the
shader's own push-constant variable. The append offset is the end of the
user's block, computed from the members' explicit Offset/ArrayStride/
MatrixStride decorations (correct under both scalar and std140 layout) and
rounded up to 8. Shaders with no push constant of their own keep getting a
freshly synthesized single-member block at offset 0, exactly as before.

That offset is published via Device::workaroundTlasPushOffset and RTPass
feeds it to vkCmdPushDataEXT so the address lands where the rewritten load
reads it (0 for the synthesized case, preserving prior behaviour).

Verified on the affected driver (NVIDIA 610.43.02, RTX 4090): VulkanTriangle
ray-traces correctly and validation-clean both with and without a
user-declared raygen push constant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
catbot 2026-06-03 02:28:02 +00:00
commit 45ecc91424
4 changed files with 204 additions and 55 deletions

View file

@ -44,12 +44,16 @@ bug (full investigation in #7, summarised below).
proprietary driver only, `VulkanShader` rewrites the compiled SPIR-V at proprietary driver only, `VulkanShader` rewrites the compiled SPIR-V at
module-load time so that every `OpLoad` of an `accelerationStructureEXT` module-load time so that every `OpLoad` of an `accelerationStructureEXT`
out of the heap becomes a load of the TLAS *device address* (from a out of the heap becomes a load of the TLAS *device address* (from a
synthesized push-constant block) followed by push-constant block) followed by
`OpConvertUToAccelerationStructureKHR` — which reads no descriptor and so `OpConvertUToAccelerationStructureKHR` — which reads no descriptor and so
never touches the faulting path. `RTPass` feeds the active frame's TLAS never touches the faulting path. `RTPass` feeds the active frame's TLAS
address in as push data. `raygen.glsl` and the example code are unchanged; address in as push data. SPIR-V allows only one push-constant block per
acceleration structures still bind into the heap normally. On every other entry point, so when a shader already declares one the TLAS address is
driver the workaround is inert. It's gated on appended to *that* block (rather than adding a second, which would fail
validation — issue #18); shaders without a push constant get a freshly
synthesized single-member block. `raygen.glsl` and the example code are
unchanged; acceleration structures still bind into the heap normally. On
every other driver the workaround is inert. It's gated on
`Device::workaroundDescriptorHeapAS` and confined to one fenced block in `Device::workaroundDescriptorHeapAS` and confined to one fenced block in
`interfaces/Crafter.Graphics-ShaderVulkan.cppm` so it can be deleted wholesale `interfaces/Crafter.Graphics-ShaderVulkan.cppm` so it can be deleted wholesale
once a fixed NVIDIA driver ships. once a fixed NVIDIA driver ships.

View file

@ -178,6 +178,12 @@ export namespace Crafter {
// path and RTPass pushes the active TLAS address as push data. Delete // path and RTPass pushes the active TLAS address as push data. Delete
// this flag and everything keyed on it once a fixed driver ships. // this flag and everything keyed on it once a fixed driver ships.
inline static bool workaroundDescriptorHeapAS = false; inline static bool workaroundDescriptorHeapAS = false;
// Byte offset of the TLAS-address member inside the patched raygen's
// push-constant block — 0 for a freshly synthesized block, or the end
// of the user's own block when the address is appended to it (the
// shader can't have two push-constant blocks). VulkanShader sets this
// at module load; RTPass feeds it to vkCmdPushDataEXT.
inline static std::uint32_t workaroundTlasPushOffset = 0;
static void CheckVkResult(VkResult result); static void CheckVkResult(VkResult result);
static std::uint32_t GetMemoryType(std::uint32_t typeBits, VkMemoryPropertyFlags properties); static std::uint32_t GetMemoryType(std::uint32_t typeBits, VkMemoryPropertyFlags properties);

View file

@ -46,7 +46,10 @@ export namespace Crafter {
VkDeviceAddress tlasAddr = RenderingElement3D::tlases[frameIdx].address; VkDeviceAddress tlasAddr = RenderingElement3D::tlases[frameIdx].address;
VkPushDataInfoEXT pushInfo { VkPushDataInfoEXT pushInfo {
.sType = VK_STRUCTURE_TYPE_PUSH_DATA_INFO_EXT, .sType = VK_STRUCTURE_TYPE_PUSH_DATA_INFO_EXT,
.offset = 0, // Where the rewritten raygen reads the TLAS address: 0 when
// VulkanShader synthesized a fresh block, or the offset of
// the member it appended to the shader's existing block.
.offset = Device::workaroundTlasPushOffset,
.data = { .address = &tlasAddr, .size = sizeof(tlasAddr) }, .data = { .address = &tlasAddr, .size = sizeof(tlasAddr) },
}; };
Device::vkCmdPushDataEXT(cmd, &pushInfo); Device::vkCmdPushDataEXT(cmd, &pushInfo);

View file

@ -42,22 +42,36 @@ import :Types;
// //
// glslang has no GLSL spelling for that conversion, so we rewrite the compiled // glslang has no GLSL spelling for that conversion, so we rewrite the compiled
// SPIR-V at module-load time: every `OpLoad %accelStruct <heap-ptr>` becomes a // SPIR-V at module-load time: every `OpLoad %accelStruct <heap-ptr>` becomes a
// load of the TLAS device address from a synthesized push-constant block // load of the TLAS device address from a push-constant block followed by
// followed by OpConvertUToAccelerationStructureKHR. RTPass pushes the active // OpConvertUToAccelerationStructureKHR. RTPass pushes the active frame's TLAS
// frame's TLAS address into that push constant. Shaders that never touch an // address into that push constant. Shaders that never touch an acceleration
// acceleration structure (no OpTypeAccelerationStructureKHR) are left untouched. // structure (no OpTypeAccelerationStructureKHR) are left untouched.
namespace WorkaroundNvidiaAS { //
// SPIR-V allows at most one push-constant variable per entry point, so we never
// add a second one: if the shader already declares a push-constant block we
// append a ulong member (the TLAS address) to the *existing* block and read
// from there; only shaders with no push constant of their own get a freshly
// synthesized single-member block. Its byte offset is the offset of that
// member (published via Crafter::Device::workaroundTlasPushOffset) which RTPass feeds to
// vkCmdPushDataEXT so the address lands where the rewritten load reads it.
//
// Exported so tests/PushConstantRewrite can drive Patch() over real compiled
// SPIR-V and check the result with spirv-val; nothing in the engine calls it
// from outside this file. Goes away with the rest of the workaround.
export namespace WorkaroundNvidiaAS {
// SPIR-V numeric opcodes / enums used below. // SPIR-V numeric opcodes / enums used below.
enum : std::uint32_t { enum : std::uint32_t {
OpEntryPoint = 15, OpCapability = 17, OpEntryPoint = 15, OpCapability = 17,
OpTypeInt = 21, OpTypeStruct = 30, OpTypePointer = 32, OpTypeInt = 21, OpTypeFloat = 22, OpTypeVector = 23, OpTypeMatrix = 24,
OpTypeArray = 28, OpTypeStruct = 30, OpTypePointer = 32,
OpConstant = 43, OpVariable = 59, OpLoad = 61, OpAccessChain = 65, OpConstant = 43, OpVariable = 59, OpLoad = 61, OpAccessChain = 65,
OpDecorate = 71, OpMemberDecorate = 72, OpDecorate = 71, OpMemberDecorate = 72,
OpConvertUToAccelerationStructureKHR = 4447, OpConvertUToAccelerationStructureKHR = 4447,
OpTypeAccelerationStructureKHR = 5341, OpTypeAccelerationStructureKHR = 5341,
CapabilityInt64 = 11, CapabilityInt64 = 11,
StorageClassPushConstant = 9, StorageClassPushConstant = 9,
DecorationBlock = 2, DecorationOffset = 35, DecorationBlock = 2, DecorationMatrixStride = 7,
DecorationArrayStride = 6, DecorationOffset = 35,
}; };
inline bool IsAnnotation(std::uint32_t op) { inline bool IsAnnotation(std::uint32_t op) {
@ -69,6 +83,10 @@ namespace WorkaroundNvidiaAS {
using Instr = std::vector<std::uint32_t>; using Instr = std::vector<std::uint32_t>;
inline std::uint32_t AlignUp(std::uint32_t v, std::uint32_t a) {
return (v + a - 1u) & ~(a - 1u);
}
inline void Patch(std::vector<std::uint32_t>& words) { inline void Patch(std::vector<std::uint32_t>& words) {
if (words.size() < 5) return; // not a SPIR-V module we understand. if (words.size() < 5) return; // not a SPIR-V module we understand.
@ -82,23 +100,61 @@ namespace WorkaroundNvidiaAS {
i += len; i += len;
} }
// ── Scan for the AS type, reusable int/long types+constants, and the // ── Scan for the AS type, reusable int/long types+constants, any
// section boundaries we need to insert into. // existing push-constant block, the type/decoration/constant tables
// needed to size that block, and the section boundaries to insert into.
std::uint32_t asTypeId = 0, ulongTypeId = 0, uintTypeId = 0, uintZeroId = 0; std::uint32_t asTypeId = 0, ulongTypeId = 0, uintTypeId = 0, uintZeroId = 0;
std::uint32_t existingPcVarId = 0, existingPcStructId = 0, existingPtrUlongId = 0;
std::size_t lastCapIdx = 0, lastAnnotIdx = 0, firstFuncIdx = instrs.size(); std::size_t lastCapIdx = 0, lastAnnotIdx = 0, firstFuncIdx = instrs.size();
std::size_t entryIdx = instrs.size(); std::size_t entryIdx = instrs.size();
std::map<std::uint32_t, const Instr*> typeInstr; // type-result-id → defining instr
std::map<std::uint32_t, std::uint32_t> constU32; // OpConstant id → 32-bit value
std::map<std::uint32_t, std::uint32_t> uintConstByValue; // uint value → OpConstant id
std::map<std::uint32_t, std::uint32_t> arrayStride; // array type id → ArrayStride
std::map<std::uint64_t, std::uint32_t> memberOffset; // (struct<<32|idx) → Offset
std::map<std::uint64_t, std::uint32_t> memberMatStride; // (struct<<32|idx) → MatrixStride
std::map<std::uint32_t, std::uint32_t> ptrPointee; // pointer type id → pointee type id
for (std::size_t k = 0; k < instrs.size(); ++k) { for (std::size_t k = 0; k < instrs.size(); ++k) {
std::uint32_t op = instrs[k][0] & 0xFFFFu; const Instr& in = instrs[k];
std::uint32_t op = in[0] & 0xFFFFu;
switch (op) { switch (op) {
case OpTypeAccelerationStructureKHR: asTypeId = instrs[k][1]; break; case OpTypeAccelerationStructureKHR: asTypeId = in[1]; typeInstr[in[1]] = &in; break;
case OpTypeInt: case OpTypeInt:
if (instrs[k][2] == 64 && instrs[k][3] == 0) ulongTypeId = instrs[k][1]; if (in[2] == 64 && in[3] == 0) ulongTypeId = in[1];
else if (instrs[k][2] == 32 && instrs[k][3] == 0) uintTypeId = instrs[k][1]; else if (in[2] == 32 && in[3] == 0) uintTypeId = in[1];
typeInstr[in[1]] = &in;
break;
case OpTypeFloat: case OpTypeVector: case OpTypeMatrix:
case OpTypeArray: case OpTypeStruct:
typeInstr[in[1]] = &in;
break;
case OpTypePointer:
typeInstr[in[1]] = &in; ptrPointee[in[1]] = in[3];
if (in[2] == StorageClassPushConstant && in[3] == ulongTypeId)
existingPtrUlongId = in[1];
break; break;
case OpConstant: case OpConstant:
if (uintTypeId && instrs[k][1] == uintTypeId && instrs[k][3] == 0) if (in.size() >= 4) constU32[in[2]] = in[3];
uintZeroId = instrs[k][2]; if (uintTypeId && in[1] == uintTypeId && in.size() >= 4) {
uintConstByValue.emplace(in[3], in[2]);
if (in[3] == 0) uintZeroId = in[2];
}
break; break;
case OpVariable:
if (in[3] == StorageClassPushConstant) {
existingPcVarId = in[2];
existingPcStructId = ptrPointee.count(in[1]) ? ptrPointee[in[1]] : 0;
}
break;
case OpDecorate:
if (in.size() >= 4 && in[2] == DecorationArrayStride) arrayStride[in[1]] = in[3];
break;
case OpMemberDecorate: {
std::uint64_t key = (static_cast<std::uint64_t>(in[1]) << 32) | in[2];
if (in.size() >= 5 && in[3] == DecorationOffset) memberOffset[key] = in[4];
if (in.size() >= 5 && in[3] == DecorationMatrixStride) memberMatStride[key] = in[4];
break;
}
case OpCapability: lastCapIdx = k; break; case OpCapability: lastCapIdx = k; break;
case OpEntryPoint: if (entryIdx == instrs.size()) entryIdx = k; break; case OpEntryPoint: if (entryIdx == instrs.size()) entryIdx = k; break;
default: break; default: break;
@ -116,73 +172,153 @@ namespace WorkaroundNvidiaAS {
return in; return in;
}; };
// ── Synthesize the types/constants/push-constant we need, reusing any // Byte footprint of a type, honouring the explicit Array/Matrix strides
// the module already defines (SPIR-V forbids duplicate type defs). // glslang emits so the result is correct under both scalar and std140
std::vector<Instr> typeDefs; // block layout. Used only to find where an existing push block ends.
if (uintTypeId == 0) { std::function<std::uint32_t(std::uint32_t)> footprint =
uintTypeId = newId(); [&](std::uint32_t tid) -> std::uint32_t {
typeDefs.push_back(mk({OpTypeInt, uintTypeId, 32, 0})); auto it = typeInstr.find(tid);
} if (it == typeInstr.end()) return 0;
if (uintZeroId == 0) { const Instr& t = *it->second;
uintZeroId = newId(); switch (t[0] & 0xFFFFu) {
typeDefs.push_back(mk({OpConstant, uintTypeId, uintZeroId, 0})); case OpTypeInt: case OpTypeFloat: return t[2] / 8u;
} case OpTypeVector: return t[3] * footprint(t[2]);
if (ulongTypeId == 0) { case OpTypeMatrix: return t[3] * footprint(t[2]); // cols × column-vec
ulongTypeId = newId(); case OpTypeArray: {
typeDefs.push_back(mk({OpTypeInt, ulongTypeId, 64, 0})); std::uint32_t len = constU32.count(t[3]) ? constU32[t[3]] : 0;
} std::uint32_t stride = arrayStride.count(tid) ? arrayStride[tid]
std::uint32_t pcStructId = newId(); : footprint(t[2]);
std::uint32_t ptrPushStructId = newId(); return len * stride;
std::uint32_t ptrPushUlongId = newId(); }
std::uint32_t pcVarId = newId(); case OpTypeStruct: {
typeDefs.push_back(mk({OpTypeStruct, pcStructId, ulongTypeId})); std::uint32_t end = 0;
typeDefs.push_back(mk({OpTypePointer, ptrPushStructId, StorageClassPushConstant, pcStructId})); for (std::size_t m = 2; m < t.size(); ++m) {
typeDefs.push_back(mk({OpTypePointer, ptrPushUlongId, StorageClassPushConstant, ulongTypeId})); std::uint32_t idx = static_cast<std::uint32_t>(m - 2);
typeDefs.push_back(mk({OpVariable, ptrPushStructId, pcVarId, StorageClassPushConstant})); std::uint64_t key = (static_cast<std::uint64_t>(t[1]) << 32) | idx;
std::uint32_t off = memberOffset.count(key) ? memberOffset[key] : 0;
std::vector<Instr> decorations = { std::uint32_t sz;
mk({OpMemberDecorate, pcStructId, 0, DecorationOffset, 0}), auto mt = typeInstr.find(t[m]);
mk({OpDecorate, pcStructId, DecorationBlock}), if (mt != typeInstr.end() && (mt->second->at(0) & 0xFFFFu) == OpTypeMatrix
&& memberMatStride.count(key))
sz = memberMatStride[key] * (*mt->second)[3];
else
sz = footprint(t[m]);
end = std::max(end, off + sz);
}
return end;
}
case OpTypePointer: return 8;
default: return 0;
}
}; };
// ── Rewrite each `OpLoad %asType <ptr>` into address-load + convert. bool merge = existingPcVarId != 0 && existingPcStructId != 0
&& typeInstr.count(existingPcStructId)
&& (typeInstr[existingPcStructId]->at(0) & 0xFFFFu) == OpTypeStruct;
// ── Synthesize/ensure the int/long types and constants we need, reusing
// any the module already defines (SPIR-V forbids duplicate type defs).
std::vector<Instr> typeDefs;
if (uintTypeId == 0) { uintTypeId = newId(); typeDefs.push_back(mk({OpTypeInt, uintTypeId, 32, 0})); }
if (ulongTypeId == 0) { ulongTypeId = newId(); typeDefs.push_back(mk({OpTypeInt, ulongTypeId, 64, 0})); }
std::uint32_t pcVarId, ptrPushUlongId, memberIdxConstId, memberIdx;
std::vector<Instr> decorations;
if (merge) {
// Append a ulong member to the user's existing block; read from it.
pcVarId = existingPcVarId;
const Instr* structInstr = typeInstr[existingPcStructId];
memberIdx = static_cast<std::uint32_t>(structInstr->size() - 2);
Crafter::Device::workaroundTlasPushOffset = AlignUp(footprint(existingPcStructId), 8);
ptrPushUlongId = existingPtrUlongId;
if (ptrPushUlongId == 0) {
ptrPushUlongId = newId();
typeDefs.push_back(mk({OpTypePointer, ptrPushUlongId, StorageClassPushConstant, ulongTypeId}));
}
// Member index constant for the access chain — reuse an existing
// uint constant of the right value, else mint one (must be an
// integer constant, so only uint-typed ones qualify for reuse).
auto found = uintConstByValue.find(memberIdx);
if (found != uintConstByValue.end()) {
memberIdxConstId = found->second;
} else {
memberIdxConstId = newId();
typeDefs.push_back(mk({OpConstant, uintTypeId, memberIdxConstId, memberIdx}));
}
decorations.push_back(mk({OpMemberDecorate, existingPcStructId, memberIdx, DecorationOffset, Crafter::Device::workaroundTlasPushOffset}));
} else {
// No user push constant — synthesize a fresh single-member block.
if (uintZeroId == 0) { uintZeroId = newId(); typeDefs.push_back(mk({OpConstant, uintTypeId, uintZeroId, 0})); }
std::uint32_t pcStructId = newId();
std::uint32_t ptrPushStructId = newId();
ptrPushUlongId = newId();
pcVarId = newId();
typeDefs.push_back(mk({OpTypeStruct, pcStructId, ulongTypeId}));
typeDefs.push_back(mk({OpTypePointer, ptrPushStructId, StorageClassPushConstant, pcStructId}));
typeDefs.push_back(mk({OpTypePointer, ptrPushUlongId, StorageClassPushConstant, ulongTypeId}));
typeDefs.push_back(mk({OpVariable, ptrPushStructId, pcVarId, StorageClassPushConstant}));
decorations.push_back(mk({OpMemberDecorate, pcStructId, 0, DecorationOffset, 0}));
decorations.push_back(mk({OpDecorate, pcStructId, DecorationBlock}));
memberIdxConstId = uintZeroId;
Crafter::Device::workaroundTlasPushOffset = 0;
}
// ── Rewrite each `OpLoad %asType <ptr>` into address-load + convert, and
// (when merging) append the ulong member to the existing struct type.
std::vector<Instr> rebuilt; std::vector<Instr> rebuilt;
rebuilt.reserve(instrs.size() + 8); rebuilt.reserve(instrs.size() + 8);
for (const Instr& in : instrs) { for (Instr in : instrs) {
std::uint32_t op = in[0] & 0xFFFFu; std::uint32_t op = in[0] & 0xFFFFu;
if (op == OpLoad && in[1] == asTypeId) { if (op == OpLoad && in[1] == asTypeId) {
std::uint32_t resultId = in[2]; std::uint32_t resultId = in[2];
std::uint32_t chainId = newId(); std::uint32_t chainId = newId();
std::uint32_t addrId = newId(); std::uint32_t addrId = newId();
rebuilt.push_back(mk({OpAccessChain, ptrPushUlongId, chainId, pcVarId, uintZeroId})); rebuilt.push_back(mk({OpAccessChain, ptrPushUlongId, chainId, pcVarId, memberIdxConstId}));
rebuilt.push_back(mk({OpLoad, ulongTypeId, addrId, chainId})); rebuilt.push_back(mk({OpLoad, ulongTypeId, addrId, chainId}));
rebuilt.push_back(mk({OpConvertUToAccelerationStructureKHR, asTypeId, resultId, addrId})); rebuilt.push_back(mk({OpConvertUToAccelerationStructureKHR, asTypeId, resultId, addrId}));
} else { } else {
rebuilt.push_back(in); if (merge && op == OpTypeStruct && in[1] == existingPcStructId) {
in.push_back(ulongTypeId);
in[0] = static_cast<std::uint32_t>(in.size() << 16) | OpTypeStruct;
}
rebuilt.push_back(std::move(in));
} }
} }
instrs.swap(rebuilt); instrs.swap(rebuilt);
// Recompute structural anchors (the rewrite above shifted indices). // Recompute structural anchors (the rewrite above shifted indices).
lastCapIdx = 0; lastAnnotIdx = 0; firstFuncIdx = instrs.size(); entryIdx = instrs.size(); lastCapIdx = 0; lastAnnotIdx = 0; firstFuncIdx = instrs.size(); entryIdx = instrs.size();
std::size_t structIdx = instrs.size();
for (std::size_t k = 0; k < instrs.size(); ++k) { for (std::size_t k = 0; k < instrs.size(); ++k) {
std::uint32_t op = instrs[k][0] & 0xFFFFu; std::uint32_t op = instrs[k][0] & 0xFFFFu;
if (op == OpCapability) lastCapIdx = k; if (op == OpCapability) lastCapIdx = k;
if (op == OpEntryPoint && entryIdx == instrs.size()) entryIdx = k; if (op == OpEntryPoint && entryIdx == instrs.size()) entryIdx = k;
if (IsAnnotation(op)) lastAnnotIdx = k; if (IsAnnotation(op)) lastAnnotIdx = k;
if (op == 54 && firstFuncIdx == instrs.size()) firstFuncIdx = k; if (op == 54 && firstFuncIdx == instrs.size()) firstFuncIdx = k;
if (merge && op == OpTypeStruct && instrs[k][1] == existingPcStructId) structIdx = k;
} }
// Append the push-constant variable to the entry point's interface // The newly-defined types (notably ulong) must precede every use. When
// list (required for SPIR-V ≥ 1.4 — both raygen modules are 1.4). // merging, the user's struct — now carrying the appended ulong member —
if (entryIdx != instrs.size() && words[1] >= 0x00010400u) { // already sits in the type section, so the defs go in just before it;
// for a fresh block the whole bundle can go at the end of the type
// section (right before the first function).
std::size_t typeDefsIdx = (merge && structIdx != instrs.size()) ? structIdx : firstFuncIdx;
// A freshly synthesized push-constant variable must join the entry
// point's interface list (required for SPIR-V ≥ 1.4 — raygen is 1.4).
// A merged-into variable is already used, so it is already listed.
if (!merge && entryIdx != instrs.size() && words[1] >= 0x00010400u) {
instrs[entryIdx].push_back(pcVarId); instrs[entryIdx].push_back(pcVarId);
instrs[entryIdx][0] = static_cast<std::uint32_t>(instrs[entryIdx].size() << 16) instrs[entryIdx][0] = static_cast<std::uint32_t>(instrs[entryIdx].size() << 16)
| OpEntryPoint; | OpEntryPoint;
} }
// Insert highest-index-first so earlier anchors stay valid. // Insert highest-index-first so earlier anchors stay valid (typeDefsIdx
instrs.insert(instrs.begin() + firstFuncIdx, typeDefs.begin(), typeDefs.end()); // ≥ lastAnnotIdx+1 ≥ lastCapIdx+1 in both the merge and synthesize cases).
instrs.insert(instrs.begin() + typeDefsIdx, typeDefs.begin(), typeDefs.end());
instrs.insert(instrs.begin() + lastAnnotIdx + 1, decorations.begin(), decorations.end()); instrs.insert(instrs.begin() + lastAnnotIdx + 1, decorations.begin(), decorations.end());
instrs.insert(instrs.begin() + lastCapIdx + 1, mk({OpCapability, CapabilityInt64})); instrs.insert(instrs.begin() + lastCapIdx + 1, mk({OpCapability, CapabilityInt64}));