External cmake deps build single-threaded: cmake --build lacks --parallel #20

Closed
opened 2026-06-01 22:34:08 +02:00 by jorijnvdgraaf · 0 comments

Summary

External cmake dependencies (DPP, msquic, glslang, …) are built single-threaded: BuildCMake invokes cmake --build <dir> with no --parallel/-j, and ConfigureCMake uses the default generator (Unix Makefiles on Linux), which is serial unless told otherwise. On a multi-core host this is a large, avoidable slowdown — every external dep compiles one translation unit at a time.

Evidence (live build)

Building the 3DForts lobby server, which pulls in DPP (Discord++) via an external cmake dependency:

  • One clang++ at a time, each pegging a single core at ~99%, the rest idle. Sampled 6 s apart → different processes on different files (dpp/guild.cppdpp/http_server_request.cpp), i.e. strictly sequential.
  • DPP is ~178 .cpp files; at -O3, single-threaded, that's many minutes.
  • Host had 12 cores, 0–1 concurrent clang++.

This isn't a hang — it's just serial. But it's slow enough that automated runners polling the build look stuck and can burn their time budget waiting on a dep build that should take ~1/N as long.

Root cause (code references)

implementations/Crafter.Build-External.cpp:

// line 212
std::string BuildCMake(const fs::path& cmakeBuildDir) {
    std::string cmd = std::format("cmake --build {}", ShellQuote(fs::absolute(cmakeBuildDir).string()));
    //                            ^^^^^^^^^^^^^^^^^^^ no --parallel → serial for the Makefiles generator
    CommandResult r = RunCommandChecked(cmd);
    ...
}

Contributing: ConfigureCMake (line 194) configures with the default generator —

std::string cmd = std::format("cmake -S {} -B {}{}", ...);   // no -G Ninja

Unix Makefiles is serial without -j; Ninja would parallelize by default. Either way, cmake --build without --parallel doesn't use the cores.

Fix

Pass an explicit parallel job count to cmake --build, reusing the same concurrency crafter-build already uses elsewhere. The test runner already does exactly this in implementations/Crafter.Build-Test.cpp:603:

int jobs = opts.jobs > 0 ? opts.jobs : std::max(1u, std::thread::hardware_concurrency());

So in BuildCMake:

std::string BuildCMake(const fs::path& cmakeBuildDir) {
    unsigned jobs = std::max(1u, std::thread::hardware_concurrency());
    std::string cmd = std::format("cmake --build {} --parallel {}",
        ShellQuote(fs::absolute(cmakeBuildDir).string()), jobs);
    ...
}

Notes:

  • Pass an explicit count (--parallel N), not a bare --parallel — on the Makefiles generator a bare --parallel maps to make -j with no limit (unbounded fork), which is worse.
  • Ideally share one job-count source with the main compile scheduler / the --jobs= CLI flag (Crafter.Build-Clang.cpp:1378) so crafter-build --jobs=N governs dep builds too, instead of duplicating hardware_concurrency().
  • Alternative/orthogonal: configure external deps with -G Ninja (parallel by default, and faster incremental). --parallel N is the smaller, lower-risk change.

Impact

Every external-dependency build (DPP, msquic, glslang, …) is ~N× slower than it should be on an N-core machine. Fixing it cuts cold-build time for any project with cmake deps substantially, and stops long dep builds from looking like hangs to anything watching the build.

## Summary External cmake dependencies (DPP, msquic, glslang, …) are built **single-threaded**: `BuildCMake` invokes `cmake --build <dir>` with no `--parallel`/`-j`, and `ConfigureCMake` uses the default generator (Unix Makefiles on Linux), which is serial unless told otherwise. On a multi-core host this is a large, avoidable slowdown — every external dep compiles one translation unit at a time. ## Evidence (live build) Building the 3DForts lobby server, which pulls in DPP (Discord++) via an external cmake dependency: - One `clang++` at a time, each pegging a single core at ~99%, the rest idle. Sampled 6 s apart → different processes on different files (`dpp/guild.cpp` → `dpp/http_server_request.cpp`), i.e. strictly sequential. - DPP is ~178 `.cpp` files; at `-O3`, single-threaded, that's many minutes. - Host had **12 cores**, **0–1** concurrent `clang++`. This isn't a hang — it's just serial. But it's slow enough that automated runners polling the build look stuck and can burn their time budget waiting on a dep build that should take ~1/N as long. ## Root cause (code references) `implementations/Crafter.Build-External.cpp`: ```cpp // line 212 std::string BuildCMake(const fs::path& cmakeBuildDir) { std::string cmd = std::format("cmake --build {}", ShellQuote(fs::absolute(cmakeBuildDir).string())); // ^^^^^^^^^^^^^^^^^^^ no --parallel → serial for the Makefiles generator CommandResult r = RunCommandChecked(cmd); ... } ``` Contributing: `ConfigureCMake` (line 194) configures with the default generator — ```cpp std::string cmd = std::format("cmake -S {} -B {}{}", ...); // no -G Ninja ``` Unix Makefiles is serial without `-j`; Ninja would parallelize by default. Either way, `cmake --build` without `--parallel` doesn't use the cores. ## Fix Pass an explicit parallel job count to `cmake --build`, reusing the same concurrency crafter-build already uses elsewhere. The test runner already does exactly this in `implementations/Crafter.Build-Test.cpp:603`: ```cpp int jobs = opts.jobs > 0 ? opts.jobs : std::max(1u, std::thread::hardware_concurrency()); ``` So in `BuildCMake`: ```cpp std::string BuildCMake(const fs::path& cmakeBuildDir) { unsigned jobs = std::max(1u, std::thread::hardware_concurrency()); std::string cmd = std::format("cmake --build {} --parallel {}", ShellQuote(fs::absolute(cmakeBuildDir).string()), jobs); ... } ``` Notes: - Pass an **explicit** count (`--parallel N`), not a bare `--parallel` — on the Makefiles generator a bare `--parallel` maps to `make -j` with no limit (unbounded fork), which is worse. - Ideally share one job-count source with the main compile scheduler / the `--jobs=` CLI flag (`Crafter.Build-Clang.cpp:1378`) so `crafter-build --jobs=N` governs dep builds too, instead of duplicating `hardware_concurrency()`. - Alternative/orthogonal: configure external deps with `-G Ninja` (parallel by default, and faster incremental). `--parallel N` is the smaller, lower-risk change. ## Impact Every external-dependency build (DPP, msquic, glslang, …) is ~N× slower than it should be on an N-core machine. Fixing it cuts cold-build time for any project with cmake deps substantially, and stops long dep builds from looking like hangs to anything watching the build.
Sign in to join this conversation.
No description provided.