Twelve hours of "the score keeps moving" turned out to be the input file moving, not the score. With a clean zero-state benchmark and one defense-in-depth via fix, the kehVM board now routes deterministically from TOML to a fully-connected, DRC-clean PCB.
Across one session we recorded 0/5, 90/11, 132/150, 199/108, and 137/130 (S/U) on what we believed to be the same code and the same design. Different "baselines" appeared at different times. The 0/5 we kept calling "the baseline" turned out to be one particular intermediate state of hardware/kvm-carrier.kicad_pcb that an earlier half-completed run had left behind.
fast_converge.py runs layout_board.py in-process and overwrites hardware/kvm-carrier.kicad_pcb each round. When a run crashed mid-iteration (the recurring placer cuda↔cpu tensor-mismatch was crashing 100% of fast-converge runs in this session), the next run started from whatever partial state the crash left. The drift between runs was not a code regression. It was the input changing under us.
The Rust pcb-pipeline binary already runs the full TOML→place→route→DRC flow deterministically. Wrapped it in a mise task that writes the output PCB to /tmp/kvm-bench/<run-id>/ rather than hardware/. Added a --json-out flag so each run drops a structured result into bench-results/<git_sha>-s<seed>-r<rounds>-i<iters>-<unix_ts>.json with config, total_ms, per-round drc, score, and elapsed time. Idempotent: same TOML + same code + same seed = same numbers.
Running pipeline:zero from a clean slate produced 0 shorts / 3 unconnected on seed 42, 1 round, 2000 iterations. All three DRC items had the same shape: two trace segments of the same net (e.g. +1V8) starting at the exact same XY but on different signal layers, with no via at that XY connecting them. The router considered the net successfully routed and never reported a failure.
In crates/router/src/router.rs, extend_route_results_from_route3d writes Route3D.segments into RouteResult and a separate Vec<Via> into ViaResult. There is no cross-check that every layer transition in the segments has a matching via. Some rescue or shove edge case emits the segments but loses the connecting via, and the segments-only output passes the router's own success criteria.
After canonicalize_vias, walk every net's segment endpoints, group by 1/10000 mm XY, and for any (net, xy) that touches two or more distinct signal layers but has no via at that xy, emit one using the smallest padstack that spans the required layer range. Re-canonicalize so duplicates collapse. On the zero-state bench it inserted 5 vias (the 3 DRC items were a subset — KiCad merged some adjacent reports). The real fix is to find which router path drops the via in the first place; the validator is in place either way.
pipeline:zero seed=42 rounds=1 max_iters=2000 → 0/0/0 score, 17 vias, 43 traces, output at /tmp/kvm-bench/65ab241-s42-r1-i2000-1777396127/board.kicad_pcb. Bench artifact at bench-results/65ab241-s42-r1-i2000-1777396127.json. Reproducible from the same git_sha and seed.
We spent multiple hours chasing what looked like a sexpr-migration regression, an analytical-placer regression, a routing regression, and a synced-netlist regression. None of them existed. The deltas between runs were entirely caused by the in-tree PCB drifting across half-completed pipeline invocations. The lesson: any benchmark that depends on a mutable input file is meaningless. If you cannot start from nothing, you cannot measure anything.
mise run pipeline:zero 42 1 2000
# → /tmp/kvm-bench/<run-id>/board.kicad_pcb (output)
# → bench-results/<run-id>.json (structured result)