NVIDIA / academic open-source EDA

DREAMPlace

DREAMPlace is one of the clearest SOTA signals for our placer because our placer is already PyTorch/Nesterov-like. The lesson is not just GPU use; it is a clean separation between differentiable global placement kernels, detailed placement, routability, and timing extensions.

Class

GPU-accelerated analytical placement

Core Stance

Recasts analytical placement as neural-network-like optimization in PyTorch, accelerating wirelength and density kernels on GPUs.

Page Sections

How It Works Compared With Our Flow Gaps It Exposes What We Should Steal

Architecture

How It Works

DREAMPlace maps analytical placement to deep-learning toolkit operations, so objective evaluation and gradient computation run efficiently on GPU.

It is based on the ePlace/RePlAce analytical placer family, using wirelength and density objectives with custom high-performance kernels.

The NVIDIA research summary reports around 40x speedup in global placement versus multi-threaded RePlAce without quality loss.

The architecture is framework-oriented: core kernels can be reused and extended for detailed placement, macro placement, routability, and timing-aware variants.

Later research builds on DREAMPlace for GPU macro placement, timing-driven placement, and routability-aware learning methods.

Comparison

Compared With Our Flow

Our fast placer uses PyTorch and Nesterov-style optimization, but lacks DREAMPlace-level kernel maturity and benchmark discipline.

Our density/routability model is board-specific and evolving; DREAMPlace-style flows treat wirelength, density, legality, and routability as separable tested kernels.

Our legalization is greedy and PCB-aware. DREAMPlace has a stronger global/detailed placement separation, though for IC cells rather than PCB components.

Our placement objective still under-models connector mechanics, BGA escape, route layers, and post-route DRC cost.

Gaps

Gaps It Exposes

No benchmark suite with known placements, seeds, and repeatable quality metrics.

No optimized custom kernels for the expensive placement terms.

No formal detailed-placement stage after global placement beyond greedy legalization.

No smooth differentiable proxy for BGA escape or route-guide feasibility.

Actions

What We Should Steal

Split placement into global placement, detailed placement, and legalization with independent metrics.

Build a small board benchmark suite and track HPWL, overlap, DRC, route success, area, and runtime per commit.

Add differentiable routability terms: pin density, escape-channel pressure, connector corridor cost, and layer-demand heatmaps.

Keep PyTorch for iteration speed, but isolate hot kernels so Rust/CUDA replacements are possible later.

Sources

NVIDIA Research DREAMPlace summary

https://research.nvidia.com/publication/2020-06_dreamplace-deep-learning-toolkit-enabled-gpu-acceleration-modern-vlsi-placement

DREAMPlace paper PDF

https://research.nvidia.com/sites/default/files/pubs/2019-06_DREAMPlace%3A-Deep-Learning/54_1_Lin_DREAMPLACE.pdf

NVIDIA AutoDMP blog using DREAMPlace

https://developer.nvidia.com/blog/autodmp-optimizes-macro-placement-for-chip-design-with-ai-and-gpus/

Other SOTA Pages

Altium

Altium ActiveRoute and Interactive Routing

Open-source PCB ecosystem

KiCad Push-and-Shove and Freerouting

OpenROAD

OpenROAD RePlAce, FastRoute, and TritonRoute

Google Research / TILOS / academic AI4EDA

Learning-Based Macro Placement