Concurrent crafter-build invocations corrupt the shared module cache (malformed or corrupted precompiled file) #15

Merged
jorijnvdgraaf merged 1 commit from claude/issue-14 into master 2026-05-30 18:44:54 +02:00
Member

Done. Summary:

Root causeBuildStdPcm and EnsureCrafterBuildPcms wrote PCMs directly to shared paths under XDG_CACHE_HOME/<target>-<march>/. Two crafter-build invocations sharing the cache would clobber each other, and readers via -fprebuilt-module-path saw torn bytes → malformed or corrupted precompiled file: 'can't skip to bit X from Y'.

Fix — added MakeTempPcmPath + CommitPcm helpers in Crafter.Build-Platform.cpp. All 6 PCM write sites (3 platforms × {BuildStdPcm, EnsureCrafterBuildPcms}) now precompile to <final>.tmp.<pid>.<seq> and atomic-rename into place. The mingw-on-Linux std.cppm copy is per-PID for the same reason. CommitPcm tolerates a parallel rename winning the race as long as the destination ends up valid.

Verification

  • Reproduced the exact error from the bug report against the pre-fix system binary (3/3 cold-cache parallel builds failed).
  • Same scenario against my fixed binary: 3/3 succeed.
  • Added ConcurrentCacheRace regression test that races four LoadProject() calls against a scratch cache — triggers the race 5/5 without the fix, passes 5/5 with it.
  • All 11 self-tests pass.

Committed as 96d1df9 on claude/issue-14.


Resolves #14 — autonomous claude-podman run.

Done. Summary: **Root cause** — `BuildStdPcm` and `EnsureCrafterBuildPcms` wrote PCMs directly to shared paths under `XDG_CACHE_HOME/<target>-<march>/`. Two `crafter-build` invocations sharing the cache would clobber each other, and readers via `-fprebuilt-module-path` saw torn bytes → `malformed or corrupted precompiled file: 'can't skip to bit X from Y'`. **Fix** — added `MakeTempPcmPath` + `CommitPcm` helpers in `Crafter.Build-Platform.cpp`. All 6 PCM write sites (3 platforms × {`BuildStdPcm`, `EnsureCrafterBuildPcms`}) now precompile to `<final>.tmp.<pid>.<seq>` and atomic-rename into place. The mingw-on-Linux std.cppm copy is per-PID for the same reason. `CommitPcm` tolerates a parallel rename winning the race as long as the destination ends up valid. **Verification** - Reproduced the exact error from the bug report against the pre-fix system binary (3/3 cold-cache parallel builds failed). - Same scenario against my fixed binary: 3/3 succeed. - Added `ConcurrentCacheRace` regression test that races four `LoadProject()` calls against a scratch cache — triggers the race 5/5 without the fix, passes 5/5 with it. - All 11 self-tests pass. Committed as `96d1df9` on `claude/issue-14`. --- Resolves #14 — autonomous claude-podman run.
fix: atomic-rename host-cache PCMs to close concurrent-build race
All checks were successful
CI / build-test-release (pull_request) Successful in 11m50s
96d1df9233
Two crafter-build invocations sharing XDG_CACHE_HOME used to clobber each
other's writes to <cache>/<target>-<march>/std.pcm and the
Crafter.Build-*.pcm modules: each LoadProject path wrote directly to the
final path, so a reader could see a half-written file and die with
"malformed or corrupted precompiled file: 'can't skip to bit X from Y'"
(issue #14). Every BuildStdPcm / EnsureCrafterBuildPcms write now goes via
<final>.tmp.<pid>.<seq> and atomic-renames into place; concurrent writers
always see either the old or the new file, never torn bytes. The mingw-on-
Linux std.cppm copy is per-PID for the same reason. Adds a regression test
(ConcurrentCacheRace) that races four LoadProject() calls against a cold
scratch cache — reproduces the race 5/5 without the fix and passes 5/5
with it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jorijnvdgraaf deleted branch claude/issue-14 2026-05-30 18:44:54 +02:00
Sign in to join this conversation.
No description provided.