fix: scope per-build module-state reset to the config being built (#16) #17

Merged
jorijnvdgraaf merged 1 commit from claude/issue-16 into master 2026-06-01 13:46:46 +02:00
Member

Fixes the deterministic crafter-build -r hang where a multi-target build freezes mid-compile — idle driver, no compiler process, never reaches N/N (issue #16).

Root cause

Build() resets each Module/ModulePartition's per-build compiled/checked flags so a reused Configuration re-evaluates mtimes. That reset recursed into cfg.dependencies — but dependency Configuration objects are shared across the build DAG (diamond deps point at the same Configuration*), and each is compiled concurrently by its own Build() call.

A parent/sibling thread's recursive reset could clear a shared dependency's module compiled atomic after that dependency's module-compile thread had set it true and exited, but before an intra-config waiter (its implementation, or a dependent partition) ran compiled.wait(false). The waiter then blocked forever on a flag nothing would re-signal — the build froze mid-compile with no compiler alive. The -r flag only perturbed thread scheduling enough to make the race reliable; the dev server itself spawns after Build() returns.

Fix

Reset only the current configuration's own modules; never recurse into dependencies. Every config in the tree already gets its own Build() call (the per-PcmDir builder registered in depResults), which resets its own state at the top of that call, sequenced before its compile threads spawn. Cross-config module state is consulted only via PCM file mtimes and the depResults futures — never via these flags — so the narrower reset is correct and removes the data race entirely.

Test

Adds ConcurrentDependencyReset: builds a static-lib dependency fully, then builds a consumer that depends on it while the dependency is already cached in depResults (so it is never rebuilt), and asserts the consumer build leaves the dependency's module compiled flag intact. Fails deterministically on the old recursive reset, passes with the fix — no reliance on thread timing.

crafter-build test: 12 passed.

Resolves #16

Fixes the deterministic `crafter-build -r` hang where a multi-target build freezes mid-compile — idle driver, no compiler process, never reaches `N/N` (issue #16). ## Root cause `Build()` resets each `Module`/`ModulePartition`'s per-build `compiled`/`checked` flags so a reused `Configuration` re-evaluates mtimes. That reset **recursed into `cfg.dependencies`** — but dependency `Configuration` objects are shared across the build DAG (diamond deps point at the same `Configuration*`), and each is compiled concurrently by its own `Build()` call. A parent/sibling thread's recursive reset could clear a shared dependency's module `compiled` atomic **after** that dependency's module-compile thread had set it `true` and exited, but **before** an intra-config waiter (its implementation, or a dependent partition) ran `compiled.wait(false)`. The waiter then blocked forever on a flag nothing would re-signal — the build froze mid-compile with no compiler alive. The `-r` flag only perturbed thread scheduling enough to make the race reliable; the dev server itself spawns *after* `Build()` returns. ## Fix Reset only the **current** configuration's own modules; never recurse into dependencies. Every config in the tree already gets its own `Build()` call (the per-`PcmDir` builder registered in `depResults`), which resets its own state at the top of that call, sequenced before its compile threads spawn. Cross-config module state is consulted only via PCM file mtimes and the `depResults` futures — never via these flags — so the narrower reset is correct and removes the data race entirely. ## Test Adds `ConcurrentDependencyReset`: builds a static-lib dependency fully, then builds a consumer that depends on it while the dependency is already cached in `depResults` (so it is never rebuilt), and asserts the consumer build leaves the dependency's module `compiled` flag intact. **Fails deterministically on the old recursive reset**, passes with the fix — no reliance on thread timing. `crafter-build test`: **12 passed**. Resolves #16
fix: scope per-build module-state reset to the config being built
All checks were successful
CI / build-test-release (pull_request) Successful in 6m41s
e76f92ae0a
Build() resets each Module/ModulePartition's per-build `compiled`/`checked`
flags so a reused Configuration re-evaluates mtimes. That reset recursed into
cfg.dependencies — but dependency Configurations are shared across the build
DAG and each is compiled concurrently by its own Build() call.

A parent/sibling's recursive reset could therefore clear a shared dependency's
module `compiled` atomic *after* that dependency's module-compile thread had
set it true and exited, but before an intra-config waiter (its impl, or a
dependent partition) ran compiled.wait(false). The waiter then blocked forever
on a flag nothing would re-signal: the build froze mid-compile, idle, with no
compiler process alive — exactly the hang in issue #16.

Reset only the current configuration's own modules. Every config in the tree
already gets its own Build() call (the per-PcmDir builder registered in
depResults), which resets its own state at the top of that call, sequenced
before its compile threads spawn. Cross-config module state is consulted only
via PCM file mtimes and the depResults futures, never via these flags, so the
narrower reset is correct and removes the data race entirely.

Adds ConcurrentDependencyReset: builds a static-lib dependency fully, then
builds a consumer that depends on it while the dependency is already cached in
depResults (so it is never rebuilt), and asserts the consumer build leaves the
dependency's module `compiled` flag intact. Fails deterministically on the old
recursive reset; passes with the fix.

Resolves #16

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
jorijnvdgraaf deleted branch claude/issue-16 2026-06-01 13:46:46 +02:00
Sign in to join this conversation.
No description provided.