Testing system needs an overhaul

jorijnvdgraaf commented

2026-05-27 04:45:11 +02:00

Owner

The current testing system works, but it has accumulated complexity that makes it bloated, fragile, and hard to reason about. Want to step back and redesign it before adding more tests.

Concerns

Runner abstraction is unclear. TestRunner (Local / Ssh / SshWin / Wsl / Cmd / FromSpec / FromEnv) is broad, env-var driven, and the flow from CRAFTER_BUILD_RUNNER_<target> → FromEnv → per-test execution is hard to follow.
Cross-arch test ergonomics are poor. The way Crafter.Math wires up cross-architecture tests is awkward — no clean way to say "this test is for target X" and have the harness route/skip accordingly.
Windows-via-SSH is flaky. tests/WindowsViaSsh depends on a reachable winvm and an ssh + scp + cmd.exe chain. Skips on missing env/host are silent, debugging failures is painful.
In-repo tests are inconsistent. The existing fixtures grew organically and don't model what a "good" test for this build system should look like.
Each test having a project.cpp is bloat Writing project.cpps for each test that are the same is alot of bloat, there should be a better solution for this.

Proposals

Drop SSH-based (Windows) testing, replace with Wine. Removes the network dependency and the second shell-quoting model (Shell::Cmd). some tests require an msvc toolchain that is not in wine, drop them.
First-class target/OS declaration per test — make it explicit in project.cpp what arch/OS a test targets, and let the harness skip/route without env-var gymnastics. Also support genuinely arch-agnostic tests.
Rewrite the in-repo test suite around the new model so the fixtures are reference examples, not legacy clutter.

Backwards compatibility does not matter, if there is a breaking change that is better then make it.
We need to reason from "What does the user want?" first and then reason how to implement that.

The current testing system works, but it has accumulated complexity that makes it bloated, fragile, and hard to reason about. Want to step back and redesign it before adding more tests. ### Concerns - **Runner abstraction is unclear.** `TestRunner` (Local / Ssh / SshWin / Wsl / Cmd / FromSpec / FromEnv) is broad, env-var driven, and the flow from `CRAFTER_BUILD_RUNNER_<target>` → `FromEnv` → per-test execution is hard to follow. - **Cross-arch test ergonomics are poor.** The way [Crafter.Math](https://forgejo.catcrafts.net/Catcrafts/Crafter.Math) wires up cross-architecture tests is awkward — no clean way to say "this test is for target X" and have the harness route/skip accordingly. - **Windows-via-SSH is flaky.** `tests/WindowsViaSsh` depends on a reachable `winvm` and an `ssh + scp + cmd.exe` chain. Skips on missing env/host are silent, debugging failures is painful. - **In-repo tests are inconsistent.** The existing fixtures grew organically and don't model what a "good" test for this build system should look like. - **Each test having a project.cpp is bloat** Writing project.cpps for each test that are the same is alot of bloat, there should be a better solution for this. ### Proposals - **Drop SSH-based (Windows) testing**, replace with **Wine**. Removes the network dependency and the second shell-quoting model (`Shell::Cmd`). some tests require an msvc toolchain that is not in wine, drop them. - **First-class target/OS declaration per test** — make it explicit in `project.cpp` what arch/OS a test targets, and let the harness skip/route without env-var gymnastics. Also support genuinely arch-agnostic tests. - **Rewrite the in-repo test suite** around the new model so the fixtures are reference examples, not legacy clutter. Backwards compatibility does not matter, if there is a breaking change that is better then make it. We need to reason from "What does the user want?" first and then reason how to implement that.

jorijnvdgraaf changed title from ~~Current testing system is bad~~ to Testing system needs an overhaul

2026-05-27 04:47:02 +02:00

jorijnvdgraaf added the

claude:ready

label

2026-05-27 04:51:31 +02:00

catbot added

claude:in-progress

and removed

claude:ready

labels

2026-05-27 04:53:07 +02:00

jorijnvdgraaf added

claude:ready

and removed

claude:in-progress

labels

2026-05-27 16:42:16 +02:00

catbot added

claude:failed

and removed

claude:ready

labels

2026-05-27 16:42:21 +02:00

jorijnvdgraaf added

claude:ready

and removed

claude:failed

labels

2026-05-27 16:50:51 +02:00

catbot added

claude:in-progress

and removed

claude:ready

labels

2026-05-27 16:52:52 +02:00

jorijnvdgraaf added

claude:ready

and removed

claude:in-progress

labels

2026-05-27 16:54:28 +02:00

catbot commented

2026-05-27 16:54:34 +02:00

Member

claude:claim:dc5801a5-b0ff-4f13-b772-4f656d99cdb2

catbot added

claude:in-progress

and removed

claude:ready

labels

2026-05-27 16:54:34 +02:00

catbot commented

2026-05-27 16:58:48 +02:00

Member

Claude exited with code 3.



























Read through the test runner and every fixture. Here's how I'd reason about this.

## Who the "user" actually is

There are two distinct users of the test system, and the current code conflates them:

1. **Project authors writing tests** (Crafter.Math, Crafter.Asset, downstream apps). They write tests for *their* code. They want to say "this is a test, it runs on these targets" and move on. Today this user gets the same surface as #2 — full project.cpp, manual runner selection — which is way too much rope.

2. **Crafter.Build authors testing the build engine itself**. They need full access (load inner fixtures, call `Build()` in-process, inspect intermediate state). This is the outer-driver pattern in `Incremental.cpp`, `Libraries.cpp`, `Cuda.cpp`, etc.

The current system was designed for #2 and reused by #1. Inverting that is most of the win.

## What user #1 wants

> "Here's `tests/MatMul/main.cpp`. Run it. If I pass `--target=aarch64`, build it for aarch64 and run it however you can — qemu, remote, whatever. If you can't, skip cleanly. I don't want to know about runners."

That implies three things:

- **Runner = function of target, not a user choice.** `qemu-<arch>` for non-host Linux, `wine` for Windows-on-Linux, `wasmtime` for WASI, direct otherwise. `CRAFTER_BUILD_RUNNER_<target>` stays as a power-user override but is no longer the main interface.
- **Target declaration is declarative, not env-var driven.** A test says `target = "aarch64-linux-gnu"` (specific), `targets = ["x86_64-*"]` (set), or nothing (arch-agnostic — runs at whatever the run targets). The harness's `targetFilter` then deterministically includes/excludes.
- **Preconditions are declarative.** Today every test has a hand-rolled `Skip()` ladder at the top of `main()` (`ToolPresent("nvcc")`, `fs::exists(sysroot)`, etc.). Move those into the test manifest as `requires = ["tool:nvcc", "file:/opt/..."]`. The harness evaluates before build, no Skip() in source.

## What user #2 wants

Outer-driver tests need code, not config — they call `LoadProject`, `Build`, inspect `BinDir()`. They should keep `project.cpp` and `ParentLib("crafter.build-lib")` because that *is* what they're testing. The mistake today is that user #1's tests **also** look like this. Separate the two and the outer-driver tests stop being a template everyone copies.

## Concrete proposed shape

tests/MatMul/ # user #1, default case
main.cpp # synthesized, runs at host or --target=...

tests/SimdNeon/ # user #1, target-specific
test.toml # target = "aarch64-linux-gnu", requires = [...]
main.cpp

tests/Incremental/ # user #2, outer-driver
project.cpp # full power, ParentLib(...), defines, deps
Incremental.cpp
inner/ # the inner fixture this test drives


**`test.toml` schema** (small, declarative — TOML or whatever lightweight format):
```toml
target  = "aarch64-linux-gnu"           # or targets = ["x86_64-*", "aarch64-*"]
march   = "armv8-a"
sysroot = "/opt/aarch64-rootfs"
requires = ["tool:qemu-aarch64", "file:/opt/aarch64-rootfs/usr/share/libc++/v1/std.cppm"]
timeout = "30s"
args = ["--seed=42"]

Runner table (replaces Local/Ssh/SshWin/Wsl/Cmd/FromSpec/FromEnv machinery):

target	runner
host triple	direct
`*-linux-gnu` non-host	`qemu-<arch>` (skip if missing)
`wasm32-wasip1`	`wasmtime` (skip if missing)
`-w64-mingw32`, `-windows-*` on Linux	`wine` (skip if missing)

One concept: prefix runner. Local is the empty prefix. SSH/SshWin/Wsl/transport machinery (copy/exec/cleanup, Shell enum, WSL path translation) deleted.

In-repo test triage

After the redesign:

Delete outright: WindowsViaSsh, SshRunner, QemuUser (these tested the env-var → runner plumbing that no longer exists; the new runner derivation is exercised implicitly every time Cuda/Wasi/CrossArchAarch64/new WindowsViaWine run).
Convert to synth + test.toml: HelloWorld, WithModule, Defines, CrossArchAarch64, Wasi, new WindowsViaWine, CrossArchAarch64.
Keep as full project.cpp (outer-driver references): Incremental, Libraries, Cuda, Diamond, CrossProjectModule, Shader, ShaderDep, BuildError, RunnerClassification, UnitLib. These genuinely test the build engine.

Things I want to confirm before implementing

Awaiting your input on those four design choices before I draft the implementation plan. Or if you want to talk through any of them before locking in — happy to.

Claude exited with code 3. ``` Read through the test runner and every fixture. Here's how I'd reason about this. ## Who the "user" actually is There are two distinct users of the test system, and the current code conflates them: 1. **Project authors writing tests** (Crafter.Math, Crafter.Asset, downstream apps). They write tests for *their* code. They want to say "this is a test, it runs on these targets" and move on. Today this user gets the same surface as #2 — full project.cpp, manual runner selection — which is way too much rope. 2. **Crafter.Build authors testing the build engine itself**. They need full access (load inner fixtures, call `Build()` in-process, inspect intermediate state). This is the outer-driver pattern in `Incremental.cpp`, `Libraries.cpp`, `Cuda.cpp`, etc. The current system was designed for #2 and reused by #1. Inverting that is most of the win. ## What user #1 wants > "Here's `tests/MatMul/main.cpp`. Run it. If I pass `--target=aarch64`, build it for aarch64 and run it however you can — qemu, remote, whatever. If you can't, skip cleanly. I don't want to know about runners." That implies three things: - **Runner = function of target, not a user choice.** `qemu-<arch>` for non-host Linux, `wine` for Windows-on-Linux, `wasmtime` for WASI, direct otherwise. `CRAFTER_BUILD_RUNNER_<target>` stays as a power-user override but is no longer the main interface. - **Target declaration is declarative, not env-var driven.** A test says `target = "aarch64-linux-gnu"` (specific), `targets = ["x86_64-*"]` (set), or nothing (arch-agnostic — runs at whatever the run targets). The harness's `targetFilter` then deterministically includes/excludes. - **Preconditions are declarative.** Today every test has a hand-rolled `Skip()` ladder at the top of `main()` (`ToolPresent("nvcc")`, `fs::exists(sysroot)`, etc.). Move those into the test manifest as `requires = ["tool:nvcc", "file:/opt/..."]`. The harness evaluates before build, no Skip() in source. ## What user #2 wants Outer-driver tests need code, not config — they call `LoadProject`, `Build`, inspect `BinDir()`. They should keep `project.cpp` and `ParentLib("crafter.build-lib")` because that *is* what they're testing. The mistake today is that user #1's tests **also** look like this. Separate the two and the outer-driver tests stop being a template everyone copies. ## Concrete proposed shape ``` tests/MatMul/ # user #1, default case main.cpp # synthesized, runs at host or --target=... tests/SimdNeon/ # user #1, target-specific test.toml # target = "aarch64-linux-gnu", requires = [...] main.cpp tests/Incremental/ # user #2, outer-driver project.cpp # full power, ParentLib(...), defines, deps Incremental.cpp inner/ # the inner fixture this test drives ``` **`test.toml` schema** (small, declarative — TOML or whatever lightweight format): ```toml target = "aarch64-linux-gnu" # or targets = ["x86_64-*", "aarch64-*"] march = "armv8-a" sysroot = "/opt/aarch64-rootfs" requires = ["tool:qemu-aarch64", "file:/opt/aarch64-rootfs/usr/share/libc++/v1/std.cppm"] timeout = "30s" args = ["--seed=42"] ``` **Runner table** (replaces `Local`/`Ssh`/`SshWin`/`Wsl`/`Cmd`/`FromSpec`/`FromEnv` machinery): | target | runner | |---|---| | host triple | direct | | `*-linux-gnu` non-host | `qemu-<arch>` (skip if missing) | | `wasm32-wasip1` | `wasmtime` (skip if missing) | | `*-w64-mingw32`, `*-windows-*` on Linux | `wine` (skip if missing) | One concept: **prefix runner**. `Local` is the empty prefix. SSH/SshWin/Wsl/transport machinery (`copy`/`exec`/`cleanup`, `Shell` enum, WSL path translation) deleted. ## In-repo test triage After the redesign: - **Delete outright**: `WindowsViaSsh`, `SshRunner`, `QemuUser` (these tested the env-var → runner plumbing that no longer exists; the new runner derivation is exercised implicitly every time `Cuda`/`Wasi`/`CrossArchAarch64`/new `WindowsViaWine` run). - **Convert to synth + test.toml**: `HelloWorld`, `WithModule`, `Defines`, `CrossArchAarch64`, `Wasi`, new `WindowsViaWine`, `CrossArchAarch64`. - **Keep as full project.cpp (outer-driver references)**: `Incremental`, `Libraries`, `Cuda`, `Diamond`, `CrossProjectModule`, `Shader`, `ShaderDep`, `BuildError`, `RunnerClassification`, `UnitLib`. These genuinely test the build engine. ## Things I want to confirm before implementing Awaiting your input on those four design choices before I draft the implementation plan. Or if you want to talk through any of them before locking in — happy to. ```

catbot added

claude:failed

and removed

claude:in-progress

labels

2026-05-27 16:58:48 +02:00

jorijnvdgraaf commented

2026-05-27 17:14:00 +02:00

Author

Owner

Yes that is a good mindset, we should prioritize downstream users over the tests in this repo.
I agree with this plan.

Yes that is a good mindset, we should prioritize downstream users over the tests in this repo. I agree with this plan.