Build log freezes when stdout is redirected: non-TTY output is block-buffered and never flushed (supersedes #16) #18

Closed
opened 2026-06-01 15:08:11 +02:00 by jorijnvdgraaf · 1 comment

Summary

When crafter-build's stdout is redirected to a file (not a TTY), progress and completion output is block-buffered and never flushed, so the log file freezes mid-build and the final lines (Built N steps in …ms, listening on port :…) never appear. Under -r (serve mode) the process then sits in its serve loop forever, so the trailing buffer is never flushed — the log looks permanently stuck even though the build finished seconds ago and the server is already answering.

Important: this supersedes #16#16 was filed on wrong information

Issue #16 ("crafter-build -r hangs at step 101/139, no compiler running") was misdiagnosed by me. There is no build deadlock. What looked like a "hung driver with no compiler children frozen at [100/139]" was simply a finished build sitting in the -r serve loop — idle driver + only the dev-server child + no compilers is exactly what success looks like. The "frozen log" was this stdout-buffering bug, not a stalled build.

Proof: while the log read 0 bytes (or was frozen at [100/139]), curl http://localhost:8080/ returned HTTP 200 and bin/<target>/3DForts.wasm (6.7 MB) was already on disk and being served (the app even rendered in-browser with WebGPU).

Please review what PR #17 (fix: scope per-build module-state reset to the config being built, merged for #16) actually changed. It was written against the phantom-deadlock description, so either it's a no-op for a real problem, or it changed behaviour for reasons that don't match a genuine bug. Worth confirming it didn't introduce a regression or mask something — and whether it should stay, be reverted, or be re-justified on its own merits.

Root cause (with code references)

implementations/Crafter.Build-Progress.cpp is correctly TTY-aware, but only the TTY path flushes:

  • TTY path — in-place redraw with explicit flush:
    • line 86 std::fflush(stdout);
    • line 94 std::fflush(stdout);
  • Non-TTY path — append-only, NO flush:
    • line 134 std::println("[{}/{}] {}", done, total, label_);
  • Finalize() — NO flush:
    • line 161 std::println("Built {} step{} in {}ms", …);
  • implementations/Crafter.Build-Clang.cpp:
    • line 1577 std::println("listening on port :{}", port); — NO flush

std::println writes through C stdout, which is fully (block) buffered when stdout is a regular file. With no flush on the non-TTY path, the [N/M] lines accumulate and only spill at ~4 KB block boundaries (≈ why it froze around step 100), and the final Built… / listening… lines stay stuck in the trailing buffer that -r never flushes.

Fix (one line)

Line-buffer stdout when it isn't a terminal. Put it as the first statement in main() (must run before any stdout write):

#include <cstdio>
#include <unistd.h>

if (!isatty(STDOUT_FILENO))
    std::setvbuf(stdout, nullptr, _IOLBF, 0);   // line-buffered when piped/redirected

Now every \n flushes, so [N/M], Built N steps, and listening on port all reach a redirected log immediately. No behaviour change on a TTY (that path already flushes). g_isTty is already computed in SetVerbosity (line 105), but setvbuf must precede any output, so main() is the correct home — not SetVerbosity.

A narrower alternative is std::fflush(stdout) at the three print sites above, but that spans two files and is easy to miss for future println calls; the setvbuf approach is more robust.

Impact

Any tooling that redirects the build log and waits for a completion/readiness marker (Built … steps, listening on port, [N/N]) hangs forever, even though the build is done and the server is up. This has repeatedly stalled an automated runner that polled the log instead of the server.

Risk of the fix: negligible — one flush per line is nothing next to spawning compilers, and a line-buffered redirected log is strictly an improvement.

## Summary When `crafter-build`'s stdout is **redirected to a file** (not a TTY), progress and completion output is **block-buffered and never flushed**, so the log file freezes mid-build and the final lines (`Built N steps in …ms`, `listening on port :…`) never appear. Under `-r` (serve mode) the process then sits in its serve loop forever, so the trailing buffer is never flushed — the log looks permanently stuck even though the build finished seconds ago and the server is already answering. ## Important: this supersedes #16 — #16 was filed on wrong information Issue #16 ("crafter-build -r hangs at step 101/139, no compiler running") was **misdiagnosed by me**. There is **no build deadlock**. What looked like a "hung driver with no compiler children frozen at `[100/139]`" was simply a *finished* build sitting in the `-r` serve loop — idle driver + only the dev-server child + no compilers is exactly what success looks like. The "frozen log" was this stdout-buffering bug, not a stalled build. Proof: while the log read **0 bytes** (or was frozen at `[100/139]`), `curl http://localhost:8080/` returned **HTTP 200** and `bin/<target>/3DForts.wasm` (6.7 MB) was already on disk and being served (the app even rendered in-browser with WebGPU). **Please review what PR #17 (`fix: scope per-build module-state reset to the config being built`, merged for #16) actually changed.** It was written against the phantom-deadlock description, so either it's a no-op for a real problem, or it changed behaviour for reasons that don't match a genuine bug. Worth confirming it didn't introduce a regression or mask something — and whether it should stay, be reverted, or be re-justified on its own merits. ## Root cause (with code references) `implementations/Crafter.Build-Progress.cpp` is correctly TTY-aware, but only the **TTY** path flushes: - TTY path — in-place redraw with explicit flush: - line 86 `std::fflush(stdout);` - line 94 `std::fflush(stdout);` - **Non-TTY path — append-only, NO flush:** - line 134 `std::println("[{}/{}] {}", done, total, label_);` - `Finalize()` — NO flush: - line 161 `std::println("Built {} step{} in {}ms", …);` - `implementations/Crafter.Build-Clang.cpp`: - line 1577 `std::println("listening on port :{}", port);` — NO flush `std::println` writes through C `stdout`, which is **fully (block) buffered** when stdout is a regular file. With no flush on the non-TTY path, the `[N/M]` lines accumulate and only spill at ~4 KB block boundaries (≈ why it froze around step 100), and the final `Built…` / `listening…` lines stay stuck in the trailing buffer that `-r` never flushes. ## Fix (one line) Line-buffer stdout when it isn't a terminal. Put it as the **first statement in `main()`** (must run before any stdout write): ```cpp #include <cstdio> #include <unistd.h> if (!isatty(STDOUT_FILENO)) std::setvbuf(stdout, nullptr, _IOLBF, 0); // line-buffered when piped/redirected ``` Now every `\n` flushes, so `[N/M]`, `Built N steps`, and `listening on port` all reach a redirected log immediately. No behaviour change on a TTY (that path already flushes). `g_isTty` is already computed in `SetVerbosity` (line 105), but `setvbuf` must precede any output, so `main()` is the correct home — not `SetVerbosity`. A narrower alternative is `std::fflush(stdout)` at the three print sites above, but that spans two files and is easy to miss for future `println` calls; the `setvbuf` approach is more robust. ## Impact Any tooling that redirects the build log and waits for a completion/readiness marker (`Built … steps`, `listening on port`, `[N/N]`) hangs forever, even though the build is done and the server is up. This has repeatedly stalled an automated runner that polled the log instead of the server. Risk of the fix: negligible — one flush per line is nothing next to spawning compilers, and a line-buffered redirected log is strictly an improvement.
Member

Fix is up as PR #19 and is green + mergeable.

  • Root cause confirmed exactly as described: non-TTY stdout is block-buffered with no flush on the append-only progress path, Finalize(), and the -r listening on port line. Hidden on normal exit (runtime flushes at exit), fatal under -r (process never exits → trailing buffer never flushes).
  • Fix: setvbuf(stdout, _IOLBF) at the top of main() when stdout isn't a TTY. Kept self-contained in main.cpp (system headers only) because the self-host exe build resolves Crafter.Build against the installed/cached module BMIs, which shadow the freshly built local ones — a new exported interface symbol isn't visible until reinstall.
  • Verified before/after with the examples/wasi project under -r, stdout redirected, server left running: old binary → log 0 bytes while curl returns HTTP 200; fixed binary → all lines (Built 4 steps…, listening on port :8080) present. crafter-build test: 12 passed.
  • Re #17: recommend keep. It fixes a genuine, independent data race on the shared per-config compiled atomic (not a no-op), with a deterministic regression test. Its only flaw is attribution — it credited itself with the observed #16 hang, which was actually this buffering bug. Details in the PR.

Merge is owner-gated (the bot has write, not merge rights on protected master — same as #17, which was owner-merged). PR #19 is ready to merge.

Fix is up as PR #19 and is **green + mergeable**. - Root cause confirmed exactly as described: non-TTY `stdout` is block-buffered with no flush on the append-only progress path, `Finalize()`, and the `-r` `listening on port` line. Hidden on normal exit (runtime flushes at exit), fatal under `-r` (process never exits → trailing buffer never flushes). - Fix: `setvbuf(stdout, _IOLBF)` at the top of `main()` when stdout isn't a TTY. Kept self-contained in main.cpp (system headers only) because the self-host exe build resolves `Crafter.Build` against the installed/cached module BMIs, which shadow the freshly built local ones — a new exported interface symbol isn't visible until reinstall. - Verified before/after with the `examples/wasi` project under `-r`, stdout redirected, server left running: old binary → log 0 bytes while `curl` returns HTTP 200; fixed binary → all lines (`Built 4 steps…`, `listening on port :8080`) present. `crafter-build test`: 12 passed. - Re #17: recommend **keep**. It fixes a genuine, independent data race on the shared per-config `compiled` atomic (not a no-op), with a deterministic regression test. Its only flaw is attribution — it credited itself with the observed #16 hang, which was actually this buffering bug. Details in the PR. Merge is owner-gated (the bot has `write`, not merge rights on protected `master` — same as #17, which was owner-merged). PR #19 is ready to merge.
Sign in to join this conversation.
No description provided.