Concurrent crafter-build invocations corrupt the shared module cache (malformed or corrupted precompiled file) #14
Labels
No labels
bug
claude:done
claude:failed
claude:in-progress
claude:ready
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Catcrafts/Crafter.Build#14
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Running two
crafter-buildinvocations concurrently corrupts the shared per-host module cache. The second process reads a half-written.pcmthat the first is still emitting, and the build dies with:The bit offsets vary run to run (it's a torn read), and the same class of error can surface against
std.pcmor any of theCrafter.Build-*.pcmartifacts.Reproduction
From a clean cache (
rm -rf "$XDG_CACHE_HOME/crafter.build"/%LOCALAPPDATA%\crafter.build), launch two builds that don't share a project but do share the host cache — e.g. a wasm app build and a sibling native tool build:One of them intermittently fails with the
malformed or corrupted precompiled fileerror above. It reproduces most reliably on a cold cache, where both processes decide the host PCMs are missing/stale and race to (re)precompile them. Serializing the two builds always succeeds.Root cause
LoadProjectbootstraps the host-side PCMs into a shared, per-host cache directory keyed only by<target>-<march>:GetCacheDir()—implementations/Crafter.Build-Platform.cpp:221cacheDir = GetCacheDir() / "{target}-{march}"— e.g.:310,:508(one per platform variant)BuildStdPcm(hostConfig, cacheDir / "std.pcm")—:313,:511EnsureCrafterBuildPcms(sourceDir, cacheDir)—:263,:459Both
BuildStdPcmandEnsureCrafterBuildPcmshave the same unsafe shape:if (fs::exists(pcm) && last_write_time(cppm) < last_write_time(pcm)) continue;(:270,:469,:252). Two processes evaluate this independently and both decide to rebuild.clang++ ... --precompile {cppm} -o {pcmPath}writes directly to the shared destination (:279,:478,:253). While process A is partway through writingstd.pcm/Crafter.Build-Progress.pcm, process B opens that same path (via-fprebuilt-module-path={cacheDir}) and reads a truncated/torn file.projectCacheMutexinCrafter.Build-External.cpponly serializes within a singlecrafter-buildprocess; nothing guards the cache between separate processes.Because the cache dir is keyed only by
target-march(not by PID/project), independent invocations on the same host collide on exactly the same files.Suggested fix
Any one of these closes the race; ideally both:
fs::renameinto place.--precompiletocacheDir / (name + ".pcm." + <pid/uuid>), then atomic-rename ontoname.pcm. A reader then always sees either the old complete file or the new complete file, never a torn one.flockacacheDir/.lock(or a per-PCM lockfile) around the staleness-check + precompile so only one process rebuilds while others wait, then re-check freshness.Impact / workaround
Anything that fans out
crafter-build(CI matrices, build-the-app-plus-a-sibling-tool scripts, parallel agent harnesses) hits this nondeterministically. Workaround is to serialize allcrafter-buildinvocations that share a host, or pre-warm the cache with a single throwaway build before fanning out.Environment
master@a930a4a(latest-26-ga930a4a)x86_64-pc-linux-gnu-native); the same pattern exists in the Windows-MSVC and mingwLoadProject/EnsureCrafterBuildPcmsvariantsimport std;Surfaced while driving the 3DForts wasm build: parallelizing the wasm app build and the
host-companionnative build tripped it immediately.claude:claim:cebdd08e-e6c7-4faa-b2b0-5853f6effb4d
PR opened: #15