Live Agent working · engine-01 Placer/router engine positioning
Back to SOTA Summary
NVIDIA / academic open-source EDA

DREAMPlace

DREAMPlace is one of the clearest SOTA signals for our placer because our placer is already PyTorch/Nesterov-like. The lesson is not just GPU use; it is a clean separation between differentiable global placement kernels, detailed placement, routability, and timing extensions.

Class

GPU-accelerated analytical placement

Core Stance

Recasts analytical placement as neural-network-like optimization in PyTorch, accelerating wirelength and density kernels on GPUs.

Page Sections
Architecture

How It Works

DREAMPlace maps analytical placement to deep-learning toolkit operations, so objective evaluation and gradient computation run efficiently on GPU.

It is based on the ePlace/RePlAce analytical placer family, using wirelength and density objectives with custom high-performance kernels.

The NVIDIA research summary reports around 40x speedup in global placement versus multi-threaded RePlAce without quality loss.

The architecture is framework-oriented: core kernels can be reused and extended for detailed placement, macro placement, routability, and timing-aware variants.

Later research builds on DREAMPlace for GPU macro placement, timing-driven placement, and routability-aware learning methods.

Comparison

Compared With Our Flow

Our fast placer uses PyTorch and Nesterov-style optimization, but lacks DREAMPlace-level kernel maturity and benchmark discipline.

Our density/routability model is board-specific and evolving; DREAMPlace-style flows treat wirelength, density, legality, and routability as separable tested kernels.

Our legalization is greedy and PCB-aware. DREAMPlace has a stronger global/detailed placement separation, though for IC cells rather than PCB components.

Our placement objective still under-models connector mechanics, BGA escape, route layers, and post-route DRC cost.

Gaps

Gaps It Exposes

No benchmark suite with known placements, seeds, and repeatable quality metrics.

No optimized custom kernels for the expensive placement terms.

No formal detailed-placement stage after global placement beyond greedy legalization.

No smooth differentiable proxy for BGA escape or route-guide feasibility.

Actions

What We Should Steal

Split placement into global placement, detailed placement, and legalization with independent metrics.

Build a small board benchmark suite and track HPWL, overlap, DRC, route success, area, and runtime per commit.

Add differentiable routability terms: pin density, escape-channel pressure, connector corridor cost, and layer-demand heatmaps.

Keep PyTorch for iteration speed, but isolate hot kernels so Rust/CUDA replacements are possible later.

Sources
Other SOTA Pages