More than 20% of the available energy is lost in “the last centimeter” from the PCB board to the microprocessor chip due to inherent inefficiencies of power delivery subsystems (PDSs) in today’s computing systems. By series-stacking multiple voltage domains to eliminate explicit voltage conversion and reduce loss along the power delivery path, voltage stacking (VS) is a novel configuration that can improve power delivery efficiency (PDE). However, VS suffers from aggravated levels of supply noise caused by current imbalance between the stacking layers, preventing its practical adoption in mainstream computing systems. Throughput-centric manycore architectures such as GPUs intrinsically exhibit more balanced workloads, yet suffer from lower PDE, making them ideal platforms to implement voltage stacking. In this paper, we present a cross-layer approach to practical voltage stacking implementation in GPUs. It combines circuit-level voltage regulation using distributed charge-recycling integrated voltage regulators (CR-IVRs) with architecture-level voltage smoothing guided by control theory. Our proposed voltage-stacked GPUs can eliminate 61.5% of total PDS energy loss and achieve 92.3% system-level power delivery efficiency, a 12.3% improvement over the conventional single-layer based PDS. Compared to the circuit-only solution, the cross-layer approach significantly reduces the implementation cost of voltage stacking (88% reduction in area overhead) without compromising supply reliability under worst-case scenarios and across a wide range of real-world benchmarks. In addition, we demonstrate that the cross-layer solution not only complements on-chip CR-IVRs to transparently manage current imbalance and restore stable layer voltages, but also serves as a seamless interface to accommodate higher-level power optimization techniques, traditionally thought to be incompatible with a VS configuration.