PL-04 — Analytical Placement

Interactive Math

The math behind the canonical Rust analytical PCB placer, explained through interactive visualizations you can drag, slide, and play with. For the product-level story around the placer and router, start with the engine overview.

THE PROBLEM

Given dozens of electronic components — module headers, a BGA video bridge, an HDMI receptacle, USB and Ethernet jacks, regulators, capacitors — and a constrained multi-layer board, find the position and rotation for each part that minimizes total wire length, leaves the courtyard around every footprint clear, and leaves enough room in every routing channel for a downstream router to actually finish the job.

THE APPROACH

Treat placement as a continuous optimization problem. Encode all component positions as a vector of (x, y) coordinates. Define a differentiable loss function — wirelength, density, near-pair springs, module-pair springs. Use Nesterov-accelerated gradient descent to minimize it, adjusting all positions simultaneously. Then hand the result to a 3D product-graph router that knows how to escape BGAs and negotiate congestion.

THE CATCH

The real objectives (bounding-box wirelength, binary overlap detection, routing feasibility) are not differentiable. They have sharp corners and discontinuities. The art of analytical placement is finding smooth approximations — close enough for optimization, smooth enough for gradient descent — and pairing that with a router whose obstacle model and search space are sharp enough to actually be DRC-clean.

Each section below introduces one piece of the puzzle, building from simple primitives to the full algorithm. The visualizations are interactive — drag, slide, and play until the math clicks.

CONTENTS

PART I — WIRELENGTH

01 Log-Sum-Exp 02 HPWL 03 Softmax Gradients

PART II — DENSITY

04 Electric Field 05 Bell-Shaped Basis

PART III — OPTIMIZATION

06 Combined Objective 07 γ Scheduling 08 RUDY Congestion

PART IV — THE FULL ENGINE

09 Nesterov Loop 10 ePlace Feedback 11 L-Shape Inflation 12 Constraints & Legalize

PART V — ROUTER HANDOFF

13 Capsule Obstacles 14 Octilinear VG 15 BGA Fanout 16 Pathfinder Loop 17 Steiner & Rip-up

SMOOTHING PARAMETER γ = 1.0

PART I — WIRELENGTH

The first challenge: measure how much wire we'll need to connect everything, in a way that gradient descent can optimize. Three ideas build on each other: a smooth approximation of max/min, a wirelength metric built from it, and understanding how gradients distribute across pins.

01 — LOG-SUM-EXP

Smooth Max

Our placer needs to minimize wirelength. Wirelength depends on the bounding box of each net's pins, which means we need max(x₁, x₂, ..., xₖ) and min(x₁, x₂, ..., xₖ). The problem: max has sharp corners. At the exact point where two values tie for the maximum, the function isn't differentiable — gradient descent doesn't know which direction to go.

The fix is the log-sum-exp (LSE) trick. Instead of max(a, b, c), we compute:

LSE(a, b, c) = (1/γ) · ln( e^γa + e^γb + e^γc )

This is smooth everywhere — infinitely differentiable — and it always overestimates the true max. The parameter γ (gamma) controls how tight the approximation is. Think of γ as a sharpness knob: turn it down and LSE becomes a blurry average; turn it up and it snaps toward the true max. The same trick works for smooth min by negating the inputs.

TRY IT

Drag the three points along the number line. The LSE marker (colored) tracks the true max — always slightly above. Slide the γ bar and watch the gap shrink. The bottom plot shows convergence over the full γ range.

KEY INSIGHT

At low γ, LSE ≈ average + constant — it spreads attention across all values. At high γ, LSE ≈ true max — it concentrates on the winner. The transition is perfectly smooth. This is the foundational trick: we've turned a non-differentiable operation into a differentiable one, at the cost of a tunable approximation error.

* * *

02 — HALF-PERIMETER WIRELENGTH

Bounding Box Wirelength

Now we use our smooth max/min to solve a real problem: estimating how much wire we'll need to connect a set of pins.

On a PCB, a net is a set of pins that need electrical connection. The simplest cost estimate is the Half-Perimeter Wirelength (HPWL): draw the tightest axis-aligned rectangle around all pins, measure half its perimeter.

HPWL = (x_max − x_min) + (y_max − y_min)

HPWL uses max and min, so we replace them with LSE to get the smooth HPWL. The smooth bounding box is always slightly larger — we're overestimating wire, which is a safe direction for optimization.

TRY IT

Drag the five pins. The inner dashed rectangle is the true bounding box. The outer one is the smooth (LSE) bounding box. Slide γ and watch the smooth box converge toward the true one.

HPWL = 0.0 SMOOTH HPWL = 0.0 ERROR = 0.0

KEY INSIGHT

HPWL is the wirelength metric used by every major placement engine (DREAMPlace, ePlace, RePlAce). It's not perfect — it underestimates multi-pin nets — but it's fast, differentiable with LSE, and correlates well with actual routed wirelength. Our placer sums smooth HPWL across all nets to get total wirelength cost.

* * *

03 — SOFTMAX GRADIENTS

Who Gets Pulled?

We've made HPWL differentiable. Now the optimizer can ask: "if I move this pin slightly, how much does wirelength change?" But not all pins matter equally. The gradient of LSE has a beautiful structure — it's a softmax.

The softmax function takes a vector of numbers and turns them into probabilities summing to 1, emphasizing the largest values. The gradient of our smooth max with respect to each input is the softmax:

∂LSE/∂xᵢ = exp(γ · xᵢ) / Σⱼ exp(γ · xⱼ) = softmax(γ · x)ᵢ

In plain English: the pin closest to being the maximum gets the strongest gradient. Interior pins — far from any bounding box edge — get almost zero. Moving an interior pin doesn't change the bounding box, so the optimizer ignores it.

TRY IT

Arrows show each pin's gradient — direction and magnitude. Hot colors = strong, cold = weak. Drag a pin to the edge of the group and watch its arrow grow. Tuck it inside and it vanishes. Slide γ: at low γ all pins share equally, at high γ only extremes carry the load.

KEY INSIGHT

Interior pins are effectively invisible to the optimizer — they move freely without affecting the objective. This sparsity is what makes analytical placement efficient: most pins have near-zero gradients on most iterations.

PART II — DENSITY

Wirelength alone would collapse everything into a single point. We need a repulsive force. The trick: borrow from electrostatics. Components are charges, the board is the conductor, and Poisson's equation gives us smooth repulsion. But first we need to represent density on a grid without introducing discontinuities.

04 — DENSITY AS ELECTRIC FIELD

Electrostatic Spreading

Wirelength alone would pull everything into a single point. We need a force that pushes components apart.

The naive approach — check every pair for overlap, penalize it — has a fatal flaw: no gradient until components touch. One millimeter away? Zero force. One micron of overlap? Sudden spike. This discontinuity makes optimization unstable.

The clever approach: treat components as electric charges. Divide the board into a grid, deposit "charge" proportional to area, and solve Poisson's equation:

∇²φ = −(ρ − ρ_target)

The gradient ∇φ gives a force field that pushes components from crowded areas toward empty ones — smoothly, everywhere. This is the "ePlace" formulation from UC San Diego, used in DREAMPlace, RePlAce, and now our placer.

TRY IT

Drag the six components. The heatmap shows density (red = crowded). Yellow arrows show electric repulsion force. Stack two components and watch forces spike. Lower ρ_target for stronger repulsion.

TARGET DENSITY ρ 0.80

KEY INSIGHT

A component "feels" nearby neighbors even before they touch. This is fundamentally different from binary overlap detection, and it's why analytical placers converge reliably instead of oscillating at boundaries.

* * *

05 — BELL-SHAPED BASIS

Smooth Density Kernels

Section 4 glossed over a critical detail: how do we assign a component's area to the grid? Hard assignment (component in cell 3,4 → all area goes there) creates discontinuities at cell boundaries. No gradient when a component crosses from one cell to the next.

The solution: smear each component's area across nearby cells using a smooth bell-shaped function. The bell used in ePlace/DREAMPlace:

b(d, w) = 3/4 − (d/w)² if |d| ≤ w

b(d, w) = (1/2)(3/2 − |d|/w)² if w < |d| ≤ 3w/2

b(d, w) = 0 otherwise

The 2D density contribution is the product: b(dx, wx) · b(dy, wy). This extends slightly beyond the component's edges — no sharp cutoffs.

TRY IT

Top: 1D cross-section — blue rectangle is the physical footprint, the curve is the bell replacement. Adjust width to see wider components produce flatter bells. Bottom: 2D density field of a single component. Drag it and watch the smooth blob follow.

COMPONENT WIDTH 80 px

KEY INSIGHT

The bell function is the bridge between hard rectangles (physical world) and smooth functions (mathematical world). Density changes continuously as a component slides — no cell-boundary artifacts. Same trick used in finite elements and fluid dynamics.

PART III — OPTIMIZATION

Differentiable wirelength + differentiable density = a loss function we can minimize. The remaining challenges: the landscape is non-convex (many local minima), so we need a clever γ schedule. And we need congestion awareness so the router can actually connect what we've placed.

06 — COMBINED OBJECTIVE

Wirelength vs Density

Now we put it together. The total loss is a weighted sum of two competing forces:

L = w_wl × wirelength + w_dn × density_penalty

Wirelength pulls connected components together — short wires are good. Density pushes them apart — overlap is bad. The final placement lives at equilibrium.

The heatmap shows the combined loss landscape for component D (green dot). Components A, B, C are fixed. D is connected to A and B via one net, and to A alone via another. Blue = low loss (good placement). Red = high loss (bad).

TRY IT

Adjust w_wl and w_dn to shift the landscape. Hit Play to watch gradient descent. Toggle Nesterov to compare — it overshoots then corrects, converging faster. Drag D to a new start position and watch it find the minimum from there.

W_WL 1.00

W_DN 0.50

NESTEROV

KEY INSIGHT

The real placer does exactly this with ~40 components and ~150 nets simultaneously, written in Rust (crates/placer) with hand-derived analytical gradients — no autograd needed. The weights are not manually scheduled: density weight is closed-loop on overflow (Section 10), and γ doubles every 400 iterations whenever overflow is improving (Section 7).

* * *

07 — γ SCHEDULING

Coarse to Fine

You might think: just set γ high from the start for the most accurate objective. That's a trap.

When γ is high, the loss landscape has many local minima — sharp valleys separated by ridges. Gradient descent falls into the nearest valley, which might not be the best one. Like hiking in fog through a field of pits.

When γ is low, the landscape is smooth — one big, gentle basin. Gradient descent easily finds the global neighborhood. Like hiking on a smooth hillside where you can see the valley.

The γ schedule exploits both: start low (find the right region), gradually increase (refine within it). Same idea as simulated annealing and curriculum learning.

TRY IT

Hit Animate to watch the landscape morph from smooth to sharp as γ increases. The green dot follows the minimum as terrain sharpens beneath it. The red bumps (ρ) are density obstacles that emerge as γ grows.

SCHEDULE γ

γ = 0.3

KEY INSIGHT

In the real placer, γ doubles every 200 iterations over ~3000 total — a 32,768× increase. Early iterations explore, late iterations refine. Starting at high γ is like parking a car while looking through a microscope.

* * *

08 — RUDY CONGESTION

Routing Demand

Legal, spread-out placement isn't enough. The router needs to actually connect everything. If too many nets pass through the same region, there aren't enough physical tracks — and a placement that's gorgeous on the WL + density landscape can leave the router with no feasible route through a critical corridor. The placer needs a routability proxy in the loss.

RUDY (Rectangular Uniform wire DensitY) estimates congestion cheaply: for each net, spread uniform "routing demand" across its bounding box, then sum all nets.

demand(cell) = Σ_nets 1 / √(bbox_area) for each cell inside the net's bbox

Where demand exceeds supply (tracks per cell), routing fails. The placer uses RUDY to modulate density targets — crowded routing channels get lower allowed component density, pushing things away from bottlenecks.

TRY IT

Drag the five components. Colored dashed rectangles = net bounding boxes. Heatmap = total routing demand. Slide supply to change track capacity. Notice how moving one component redistributes congestion across the entire board via shifting bounding boxes.

TRACK SUPPLY 2.0

KEY INSIGHT

RUDY is intentionally crude — bounding-box demand, not actual routes. But it's good enough steering to keep components out of bottleneck corridors. The placer pairs it with an L-shape global router (Section 11) for the cell-inflation outer loop and the router's history seed. The goal isn't perfect routing prediction — it's avoiding the worst bottlenecks before the real router has to deal with them.

PART IV — THE FULL ENGINE

Sections 1–8 introduced the building blocks in isolation. This part shows how they fit together inside crates/placer/: a Nesterov-accelerated outer loop, ePlace-style adaptive density weighting, an L-shape global router interleaved as a routability proxy, and a constraint layer that handles edges, near-pairs, BGA exclusion zones, and overlap legalization.

09 — NESTEROV LOOP

Acceleration with Lookahead

Section 6 toggled a Nesterov checkbox. Here's what it actually does inside the placer. Plain gradient descent sets x ← x − η ∇L(x). Nesterov adds momentum and — crucially — evaluates the gradient at a lookahead point, not at the current position:

y ← x + β · v // lookahead

v ← β · v − η · ∇L(y) // velocity update

x ← x + v // step

With β = 0.9 and base learning rate η = 50 µm/step, the placer takes ~2000 steps to converge from a uniform-grid initialization. Two more tricks tame the gradient before it reaches the velocity update:

g̃_i = g_i · √n / ‖g‖₂ // global L2 normalization

g̃_i ← g̃_i / √(pin_count_i) // preconditioning

g̃_i ← clamp(g̃_i, ±5000 µm) // clip

Preconditioning by √pin_count is the analog of diagonal Hessian scaling — high-fanout components (the SoC, the BGA bridge) accumulate gradient from many nets, so their step sizes need to be damped relative to two-pin caps. Without it, the BGA shoots across the board on early iterations and the LSE landscape loses any sense of locality.

KEY INSIGHT

Lookahead lets momentum slow itself down as it approaches a minimum — the gradient at where you'll land already points back, so the next velocity update damps. This is what gives Nesterov its O(1/k²) convergence on smooth-convex objectives, and what makes it the default in modern analytical placers (RePlAce, DREAMPlace, Ripple).

* * *

10 — EPLACE FEEDBACK

Adaptive Density Weight

Section 6 hand-tuned w_dn. The real placer doesn't — it uses ePlace-style feedback control, watching the overflow metric:

overflow = Σ_bins max(0, density_bin − target) / (n_bins · target)

Each iteration:

if overflow > 0.20: w_dn ← w_dn · 1.02 // crowded — push harder

if overflow < 0.05: w_dn ← w_dn · 0.98 // spread enough — relax

This is a slow proportional controller — 2% per iteration. It interacts with the γ schedule (Section 7) in a useful way: early iterations have low γ (smooth wirelength) and low w_dn (let things cluster); as γ doubles every 400 iterations, the wirelength gradient sharpens, components want to collapse onto each other, overflow spikes, and w_dn ramps up to compensate. The two schedules are implicitly coupled through the loss landscape.

The target density is 0.5 — half the board area is component, half is breathing room for routes, vias, and silkscreen. Density is accumulated into a 1 mm × 1 mm bin grid using the tent basis from Section 5.

KEY INSIGHT

The hyperparameter you'd most want to tune is the one most sensitive to the board. ePlace dodges this by making it a closed-loop variable: the placer measures actual overflow and adjusts w_dn to track a target. You set the density target, not the density weight.

* * *

11 — L-SHAPE INFLATION

Cheap Global Routing in the Loop

RUDY (Section 8) spreads each net's demand uniformly across its bbox. That's a fine signal for placement steering, but it overestimates demand in corners and underestimates it on the routing corridor itself. The placer also runs a coarse L-shape global router that decomposes each net into 2-pin segments and routes each segment as one of two L-bends:

L_HV: (a_x, a_y) → (b_x, a_y) → (b_x, b_y)

L_VH: (a_x, a_y) → (a_x, b_y) → (b_x, b_y)

pick whichever crosses fewer already-congested cells

This produces a per-cell demand grid that concentrates on the actual routing path, not the bbox interior. After each placement run, an outer loop checks RUDY congestion; components in cells where demand exceeds 0.2 get their footprint inflated by

f = 1 + 0.15 · clamp(demand − 0.2, 0, 2)

and the placer is restarted with the inflated cells. This is the Ripple 2.0 / DREAMPlace cell-inflation trick: congested cells take more space, so on the re-place they push neighbors away and free up the corridor. Up to 5 inflation rounds, or until max(demand) ≤ 0.3.

The L-shape demand also gets shipped downstream as the initial Pathfinder history for the router (Section 16) — bridging placer-side routability estimates into the router's cost surface so its first pass already biases away from predicted hotspots.

KEY INSIGHT

RUDY for inflation, L-shape for the router seed. RUDY is symmetric and well-suited to feedback into a smooth density grid; L-shape is sharp and well-suited to a discrete Pathfinder bin. Use the right routability proxy in the right place — neither one wears both hats well.

* * *

12 — CONSTRAINTS & LEGALIZE

Springs, Snaps, and Push-Apart

Pure WL + density gives you a layout that's mathematically optimal but practically wrong. Decoupling caps want to sit next to the regulator pin, not somewhere on the wirelength gradient. Edge connectors must be on the board edge. The BGA's footprint must stay clear of other components even when its courtyard is much larger than its body. The placer adds four constraint terms:

Near-pair springs

For each (i, j, d) constraint: penalty = w_nr · (‖x_i − x_j‖ − d)². Pulls component pairs toward a target separation.

Module-pair springs

Stronger version (weight 10×) for tightly-bound pairs — used for things like crystal + load caps that must be co-located.

Edge snap

Post-process: connectors snap exactly to (top|bottom|left|right, offset). Their positions are then fixed for any subsequent placer pass.

Exclusion zones

Rectangles seeded at density = 1.0 in the tent grid. Generated automatically around BGAs (so the bridge has fanout breathing room) and manually for keep-outs around HDMI / USB receptacles.

After the gradient loop converges, a push-apart legalizer sweeps for any AABB overlaps left over and shoves overlapping pairs apart along the minimum-overlap axis with a 1.5 mm nudge:

o_x = (w_i + w_j)/2 − |b_x − a_x|

o_y = (h_i + h_j)/2 − |b_y − a_y|

push along the smaller of o_x, o_y by (overlap + 1.5 mm) / 2

The 1.5 mm nudge isn't arbitrary — it has to exceed the worst-case pad overhang past the body edge (SOT-223 thermal tabs and 0805 passives can overhang ~0.5–0.8 mm), so neighboring footprints don't end up with overlapping pads even after their bodies are separated. Component dimensions come from extract_extents.py / extract_placement.py, which read each footprint's courtyard layer rather than body bounding boxes — so the placer is reasoning about the official keep-out, not just the silk outline.

KEY INSIGHT

The gradient loop produces a good placement; constraints + legalization produce a legal one. The two stages are distinct because gradient descent on a hard-overlap penalty would be miserable — better to let the smooth density field do the bulk of the work, then close the last 5% of overlap with a discrete sweep.

PART V — ROUTER HANDOFF

A placement is only as good as the router that can finish it. The downstream router (crates/router/) is a 3D product-graph A* over an octilinear visibility graph, with a BGA fanout pre-pass, capsule obstacles, Pathfinder negotiation seeded from the placer's L-shape demand, Steiner-MST topology, and iterative rip-up + targeted rescue. These sections describe how the math above turns into actual copper.

13 — CAPSULE OBSTACLES

Clearance That Tracks Trace Width

A trace isn't a line — it's a capsule of half-width w/2 + clearance. A naive AABB intersection test will let a centerline pass within 50 µm of a pad edge and call it routed; DRC will then reject the result. The router uses a real capsule-vs-rectangle check: the segment a→b blocks a rectangle iff the minimum distance from the rectangle to the segment is less than the capsule half-width.

dist_seg→rect(a, b, R) < (trace_width / 2) + clearance ⇒ blocked

For vias and round pads, capsule-vs-circle has a closed form:

dist(point_to_segment(c, a, b)) < r_circle + (trace_width / 2) + clearance

The same primitive checks four things during search: (1) trace centerlines against foreign pads, (2) trace centerlines against foreign vias, (3) trace centerlines against foreign trace segments (capsule-vs-capsule reduces to two capsule-vs-circle endpoint checks plus a bbox prune), and (4) candidate via barrels against existing inner-layer tracks during multi-layer search. The spatial oracle wraps a per-layer R-tree (rstar) so each capsule query is logarithmic in obstacle count.

KEY INSIGHT

Earlier router revisions used inflated AABBs as obstacles. At 0.5 mm BGA pitch, even a 50 µm inflation error makes adjacent pad obstacles overlap by a few microns and blocks every diagonal escape from interior balls. Capsules let clearance scale with trace width without ever overestimating in the diagonal direction.

* * *

14 — OCTILINEAR VG

Visibility Graph in 3D

Instead of routing on a uniform grid (millions of nodes, expensive A*), the router builds a visibility graph on demand for each net. Nodes are sampled from four sources:

▸ Source and target pads.
▸ 3 radial × 8 angular escape nodes around src and tgt (24 each at 0.5/1/2 mm).
▸ Obstacle corners, perturbed outward by ~50 µm so they don't sit exactly on the obstacle edge.
▸ 80 Halton-sequence stepping stones inside the local bbox, dropped if they land inside any obstacle.
▸ L-bend intermediates between every (endpoint, corner) pair, plus 8 octilinear directional samples at multiple radii.

Edges only connect node pairs whose connecting segment is (a) octilinear within 0.5° tolerance — i.e. parallel to one of the set [0°, 45°, 90°, 135°, ...] — and (b) unblocked by the capsule check on the relevant layer. A* runs on the product graph (node × layer), with via edges between same-(x,y) nodes on different layers carrying a per-pair cost from the stackup-derived cost model:

cost(edge) = octilinear_length · layer_trace_cost[L] + bend_penalty(prev_dir, new_dir)

The heuristic is the octilinear distance to target — admissible, tight, and gives deterministic byte-identical output across runs. Halton stepping stones matter most in BGA-dense regions where there are no naturally occurring corner anchors between closely-packed pads.

KEY INSIGHT

Visibility-graph routers are routes-as-hard-obstacles: once a net is committed, its segments become real polygons in the obstacle set for every following net. That's why pure visibility graphs need a negotiation mechanism (Section 16) on top — without it, net order completely determines the outcome.

* * *

15 — BGA FANOUT

Inside-Out Escape

The TC358743 video bridge is a 0.5 mm pitch BGA with interior balls that can't reach the package edge laterally — there isn't enough room between adjacent balls for a 100 µm trace. These balls have to escape downward, into an inner layer, via a via-in-pad (POFV) microvia.

A pre-pass runs before main routing. For each BGA-resident net, it tries — in this order:

▸ POFV on F.Cu, dropping a microvia to In1.Cu using a tighter clearance (0.09 mm) than the global routing clearance.
▸ Lateral escape on F.Cu in one of 8 octilinear directions, with the chosen direction determined by a 2-coloring of the BGA grid cell so neighbors escape in opposite directions.
▸ Multi-layer drops: F.Cu→In4.Cu (blind via) or F.Cu→B.Cu (through via) for balls that can't escape to In1.

Crucially, escapes happen inside-out — interior balls (whose escape options are most constrained) try first, then progressively outer rings. This is the SOTA technique: claiming the scarce interior real estate before outer balls have a chance to block it.

For diff-pair nets where both halves originate at adjacent BGA balls, the fanout pass emits them as a coupled pair: same escape direction, perpendicular gap matching the diff-pair spec, so the downstream segment already starts as a length-matched pair.

The crucial constant: BGA pad obstacle inflation is set to bga_escape_clearance alone (0.09 mm), not trace_width/2 + clearance. At 0.5 mm pitch, the extra trace half-width would make adjacent pad obstacles overlap by 5 µm and block every POFV attempt. (This single fix moved BGA escape success from 13/30 to 25/30.)

KEY INSIGHT

BGA fanout is a separate problem from main routing — different clearance, different cost model, different ordering, different padstacks. Trying to do it as part of A* on the same graph means the router never gets a chance to plan escape-first. The pre-pass is what makes the "padstack discount" trick from the original placer V4 spec collapse into something workable in 3D.

* * *

16 — PATHFINDER LOOP

Negotiated Congestion

Pathfinder (McMurchie & Ebeling, 1995) is the canonical FPGA congestion-negotiation algorithm and the same idea works for PCBs. Each cell in a fine grid (0.2 mm bins, per layer) carries two scalars: present cost p (proportional to current overuse) and history cost h (accumulated overuse across all previous passes).

edge_cost = base_cost · (1 + p_fac · present_overuse) · (1 + h)

Each pass: rip up all routes, re-route every net with the current cost surface, update history on over-used cells, multiply p_fac by a small factor to push convergence. Routes that consistently overlap competing demand accumulate history and become progressively expensive — the next pass naturally finds a different corridor.

Two important details for VG routers (which differ from FPGA channel routers):

▸ History is seeded from the placer's L-shape demand grid (Section 11). The first A* call already biases away from cells the placer flagged as routability hotspots — the router doesn't have to learn them from scratch.
▸ The router tracks the single best pass (lowest failed + overuse) and restores its state at the end. Late passes with very high p_fac can break previously-routing nets, so the algorithm is monotonic in best-known-result rather than in last-iteration.

KEY INSIGHT

Pathfinder works because cost surface is a pure function of the history grid — the actual committed routes are torn up between passes. This is what makes it converge without explicit conflict resolution between nets. Soft costs in, hard routes out, history as the persistent state.

* * *

17 — STEINER & RIP-UP

Multi-Pin Topology and Last-Mile Recovery

A net with n pins is routed as a Rectilinear Minimum Spanning Tree (RMST) — a Prim's MST over Manhattan distances, emitting edges in BFS order from pin 0. Each tree branch is routed by a separate A* call and committed atomically: if any branch fails, the entire net is rolled back and re-attempted later. This is much better than naive pins.windows(2) chaining, which leaves a tail of long-distance segments instead of sharing trunk geometry.

After Pathfinder converges, some nets are still unrouted — typically because two or three committed routes cooperatively block the only escape from a constrained pad. The router runs a targeted rip-up phase:

for each failed net F:

for each committed net C whose route passes through F's corridor:

tentatively rip up C

if F now routes AND C re-routes around F → keep the swap

else → restore C, try next blocker

Each swap is a NetRoute transaction with full commit/rollback. The spatial oracle and Pathfinder history are restored as part of the rollback, so a failed swap is exactly equivalent to never having tried it.

Beyond single-blocker rip-up, the same machinery powers rescue layers driven by structured DRC feedback from the previous round — guided_rescue_hints anchor a short bridge to a specific failed pad and a known same-net via, escape_site_avoid_hints seed synthetic obstacles at sites that failed last round, and short_repair_hints re-route the via-owning net at known DRC short locations. Together with placer rotation hints fed back from L-shape congestion, the system closes the loop: DRC failures inform the next placement and routing pass, not just the human reviewing the result.

KEY INSIGHT

Atomic NetRoute transactions are the load-bearing primitive. They turn "rip up and reroute" from a hand-coded special case into a composable operation: any future rescue phase (multi-blocker swap, simulated annealing on net order, randomized retry) just becomes another caller of the same commit/rollback API.

TAKEAWAYS

Gradient intuition. You can look at a placement and know which components the optimizer wants to move — the outermost pins on each net carry the wirelength gradient.

Coarse-to-fine. Starting smooth (low γ) and sharpening beats starting sharp — find the right valley first, then refine within it.

Electric field > overlap penalty. Smooth gradients everywhere vs. a step function at the boundary. This is why analytical placers converge reliably.

Closed-loop hyperparameters. The placer measures overflow and adjusts the density weight; the router measures overuse and adjusts Pathfinder history. Anything that matters is a feedback variable, not a hand-tuned constant.

Right proxy in the right place. RUDY for cell inflation (smooth, symmetric); L-shape for the router seed (sharp, corridor-aligned); capsule clearance for actual obstacle checks (tight, geometry-aware). Different problems, different abstractions.

Atomic transactions everywhere. Net routing, rip-up swaps, rescue patches — every mutation goes through commit/rollback so the system is always in a known state. This is what lets DRC feedback turn into the next round's input instead of a manual fix.

Plan the escape first. BGA fanout is its own pre-pass with its own clearance and ordering. Trying to escape a 0.5 mm-pitch interior ball as a side effect of a global A* search means it never happens; making it a separate inside-out phase makes it routine.

KVM Board — Home — Dev — Pipeline: crates/placer + crates/router