Simulation and Modeling

Processor design is a complicated art. One cannot wait for the system to be built before understanding how a potential design might perform. So early stage analysis is vital in computer architecture. To enable early stage microprocessor power and performance studies, one must rely on hardware simulators and analytics models that are sufficiently detailed enough to provide crucial insights that can provide first-order insights to a hardware architect.

Sim modeling

As the statistician George Box one said, "All models are wrong, but some are useful." We strive to develop simulation tools and models that are useful. Our group develops performance, power and reliability models for various processor types. These models are validated against hardware measurements to ensure their robustness. They are then used to guide processor architecture.


A. Zou, J. Leng, X. He, Y. Zu, V. J. Reddi, and X. Zhang, “Efficient and Reliable Power Delivery in Voltage-Stacked Manycore System With Hybrid Charge-Recycling Regulators,” in 55th ACM/ESDA/IEEE Design Automation Conference (DAC), 2018, pp. 1–6. Publisher's VersionAbstract

Voltage stacking (VS) fundamentally improves power delivery efficiency (PDE) by series-stacking multiple voltage domains to eliminate explicit step-down voltage conversion and reduce energy loss along the power delivery path. However, it suffers from aggravated supply noise, preventing its adoption in mainstream computing systems. In this paper, we investigate a practical approach to enabling efficient and reliable power delivery in voltage-stacked manycore systems that can ensure worst-case supply noise reliability without excessive costly over-design. We start by developing an analytical model to capture the essential noise behaviors in VS. It allows us to identify dominant noise contributor and derive the worst-case conditions. With this in-depth understanding, we propose a hybrid voltage regulation solution to effectively mitigate noise with worst-case guarantees. When evaluated with real-world benchmarks, our solution can achieve 93.8% power delivery efficiency, an improvement of 13.9% over the conventional baseline.

A. Zou, et al., “Ivory: Early-Stage Design Space Exploration Tool for Integrated Voltage Regulators,” in Proceedings of the 54th Annual Design Automation Conference (DAC), 2017, pp. 1. Publisher's VersionAbstract

Despite being employed in burgeoning eforts to improve power delivery eiciency, integrated voltage regulators (IVRs) have yet to be evaluated in a rigorous, systematic, or quantitative manner. To fulill this need, we present Ivory, a high-level design space exploration tool capable of providing accurate conversion eiciency, static performance characteristics, and dynamic transient responses of an IVR-enabled power delivery subsystem (PDS), enabling rapid trade-of exploration at early design stage, approximately 1000x faster than SPICE simulation. We demonstrate and validate Ivory with a wide spectrum of IVR topologies. In addition, we present a case study using Ivory to reveal the optimal PDS conigurations, with underlying power break-downs and area overheads for the GPU manycore architecture, which has yet to embrace IVRs.


J. Leng, Y. Zu, M. Rhu, M. Gupta, and V. J. Reddi, “GPUVolt: Modeling and Characterizing Voltage Noise in Gpu Architectures,” in Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2014, pp. 141–146.Abstract

Voltage noise is a major obstacle in improving processor energy eciency because it necessitates large operating voltage guardbands that increase overall power consumption and limit peak performance. Identifying the leading root causes of voltage noise is essential to minimize the unnecessary guardband and maximize the overall energy eciency. We provide the first-ever modeling and characterization of voltage noise in GPUs based on a new simulation infrastructure called GPUVolt. Using it, we identify the key intracore microarchitectural components (e.g., the register file and special functional units) that significantly impact the GPU’s voltage noise. We also demonstrate that intercore-aligned microarchitectural activity detrimentally impacts the chipwide worst-case voltage droops. On the basis of these findings, we propose a combined register-file and execution-unit throttling mechanism that smooths GPU voltage noise and reduces the guardband requirement by as much as 29%.

Categories and Subject Descriptors

C.4 [Performance of Systems]: Modeling techniques, Reliability, availability, and serviceability


di/dt, inductive noise, GPU architecture, GPU reliability

C. Zhou, X. Wang, W. Xu, Y. Zhu, V. J. Reddi, and C. H. Kim, “Estimation of Instantaneous Frequency Fluctuation in a Fast DVFS Environment Using an Empirical BTI Stress-Relaxation Model,” in Proceedings of the International Reliability Physics Symposium (IRPS), 2014, pp. 2D–2. Publisher's VersionAbstract

This work proposes an empirical Bias Temperature Instability (BTI) stress-relaxation model based on the superposition property. The model was used to study the instantaneous frequency fluctuation in a fast Dynamic Voltage and Frequency Scaling (DVFS) environment. VDD and operating frequency information for this study were collected from an ARM Cortex A15 processor based development board running an Android operating system. Simulation results show that the frequency peaks and dips are functions of mainly two parameters: (1) the amount of stress or recovery experienced by the circuit prior to the VDD switching and (2) the frequency sensitivity to device aging after the VDD switching.

J. Leng, et al., “GPUWattch: Enabling Energy Optimizations in GPGPUs,” in ACM SIGARCH Computer Architecture News, 2013, vol. 41, no. 3, pp. 487–498. Publisher's VersionAbstract

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. We propose a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements. To achieve configurability, we use a bottom-up methodology and abstract parameters from the microarchitectural components as the model’s inputs. We developed a rigorous suite of 80 microbenchmarks that we use to bound any modeling uncertainties and inaccuracies. The power model is comprehensively validated against measurements of two commercially available GPUs, and the measured error is within 9.9% and 13.4% for the two target GPUs (GTX 480 and Quadro FX5600). The model also accurately tracks the power consumption trend over time. We integrated the power model with the cycle-level simulator GPGPU-Sim and demonstrate the energy savings by utilizing dynamic voltage and frequency scaling (DVFS) and clock gating. Traditional DVFS reduces GPU energy consumption by 14.4% by leveraging within-kernel runtime variations. More finer-grained SM cluster-level DVFS improves the energy savings from 6.6% to 13.6% for those benchmarks that show clustered execution behavior. We also show that clock gating inactive lanes during divergence reduces dynamic power by 11.2%.

Categories and Subject Descriptors

C.1.4 [Processor Architectures]: Parallel Architectures; C.4 [Performance of Systems]: Modeling techniques

General Terms

Experimentation, Measurement, Power, Performance


Energy, CUDA, GPU architecture, Power estimation