DREAMPlace is one of the clearest SOTA signals for our placer because our placer is already PyTorch/Nesterov-like. The lesson is not just GPU use; it is a clean separation between differentiable global placement kernels, detailed placement, routability, and timing extensions.
GPU-accelerated analytical placement
Recasts analytical placement as neural-network-like optimization in PyTorch, accelerating wirelength and density kernels on GPUs.
DREAMPlace maps analytical placement to deep-learning toolkit operations, so objective evaluation and gradient computation run efficiently on GPU.
It is based on the ePlace/RePlAce analytical placer family, using wirelength and density objectives with custom high-performance kernels.
The NVIDIA research summary reports around 40x speedup in global placement versus multi-threaded RePlAce without quality loss.
The architecture is framework-oriented: core kernels can be reused and extended for detailed placement, macro placement, routability, and timing-aware variants.
Later research builds on DREAMPlace for GPU macro placement, timing-driven placement, and routability-aware learning methods.
Our fast placer uses PyTorch and Nesterov-style optimization, but lacks DREAMPlace-level kernel maturity and benchmark discipline.
Our density/routability model is board-specific and evolving; DREAMPlace-style flows treat wirelength, density, legality, and routability as separable tested kernels.
Our legalization is greedy and PCB-aware. DREAMPlace has a stronger global/detailed placement separation, though for IC cells rather than PCB components.
Our placement objective still under-models connector mechanics, BGA escape, route layers, and post-route DRC cost.
No benchmark suite with known placements, seeds, and repeatable quality metrics.
No optimized custom kernels for the expensive placement terms.
No formal detailed-placement stage after global placement beyond greedy legalization.
No smooth differentiable proxy for BGA escape or route-guide feasibility.
Split placement into global placement, detailed placement, and legalization with independent metrics.
Build a small board benchmark suite and track HPWL, overlap, DRC, route success, area, and runtime per commit.
Add differentiable routability terms: pin density, escape-channel pressure, connector corridor cost, and layer-demand heatmaps.
Keep PyTorch for iteration speed, but isolate hot kernels so Rust/CUDA replacements are possible later.