Active Timing Margin

2019
Y. Zu, D. Richins, C. Leufergy, and V. J. Reddi, “Fine-Tuning the Active Timing Margin (ATM) Control Loop for Maximizing Multi-Core Efficiency on an IBM POWER Server,” in Proceedings of the 25th International Symposium on High Performance Computer Architecture (HPCA), 2019.Abstract

Active Timing Margin (ATM) is a technology that improves processor efficiency by reducing the pipeline timing margin with a control loop that adjusts voltage and frequency based on real-time chip environment monitoring. Although ATM has already been shown to yield substantial performance benefits, its full potential has yet to be unlocked. In this paper, we investigate how to maximize ATM’s efficiency gain with a new means of exposing the inter-core speed variation: finetuning the ATM control loop. We conduct our analysis and evaluation on a production-grade POWER7+ system. On the POWER7+ server platform, we fine-tune the ATM control loop by programming its Critical Path Monitors, a key component of its ATM design that measures the cores’ timing margins. With a robust stress-test procedure, we expose over 200 MHz of inherent inter-core speed differential by fine-tuning the percore ATM control loop. Exploiting this differential, we manage to double the ATM frequency gain over the static timing margin; this is not possible using conventional means, i.e. by setting fixed points for each core, because the corelevel must account for chip-wide worst-case voltage variation. To manage the significant performance heterogeneity of fine-tuned systems, we propose application scheduling and throttling to manage the chip’s process and voltage variation. Our proposal improves application performance by more than 10% over the static margin, almost doubling the 6% improvement of the default, unmanaged ATM system. Our technique is general enough that it can be adopted by any system that employs an active timing margin control loop.

Keywords-Active timing margin, Performance, Power efficiency, Reliability, Critical path monitors

Paper
2017
Y. Zu, W. Huang, I. Paul, and V. J. Reddi, “Ti-States: Power Management in Active Timing Margin Processors,” IEEE Micro, vol. 37, no. 3, pp. 106–114, 2017. Publisher's VersionAbstract

TEMPERATURE INVERSION IS A TRANSISTOR-LEVEL EFFECT THAT IMPROVES PERFORMANCE WHEN TEMPERATURE INCREASES. THIS ARTICLE PRESENTS A COMPREHENSIVE MEASUREMENT-BASED ANALYSIS OF ITS IMPLICATIONS FOR ARCHITECTURE DESIGN AND POWER MANAGEMENT USING THE AMD A10-8700P PROCESSOR. THE AUTHORS PROPOSE TEMPERATURE-INVERSION STATES (TI -STATES) TO HARNESS THE OPPORTUNITIES PROMISED BY TEMPERATURE INVERSION. THEY EXPECT TI -STATES TO BE ABLE TO IMPROVE THE POWER EFFICIENCY OF MANY PROCESSORS MANUFACTURED IN FUTURE CMOS TECHNOLOGIES.

Paper
2016
Y. Zu, W. Huang, I. Paul, and V. J. Reddi, “Ti-States: Processor Power Management in the Temperature Inversion Region,” in Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016, pp. 1–13. Publisher's VersionAbstract

Temperature inversion is a transistor-level effect that can improve performance when temperature increases. It has largely been ignored in the past because it does not occur in the typical operating region of a processor, but temperature inversion is becoming increasing important in current and future technologies. In this paper, we study temperature inversion’s implications on architecture design, and power and performance management. We present the first public comprehensive measurement-based analysis on the effects of temperature inversion on a real processor, using the AMD A10- 8700P processor as our system under test. We show that the extra timing margin introduced by temperature inversion can provide more than 5% Vdd reduction benefit, and this improvement increases to more than 8% when operating in the near-threshold, low-voltage region. To harness this opportunity, we present Tistates, a power management technique that sets the processor’s voltage based on real-time silicon temperature to improve power efficiency. Ti-states lead to 6% to 12% measured power saving across a range of different temperatures compared to a fixed margin. As technology scales to FD-SOI and FinFET, we show there is an ideal operating temperature for various workloads to maximize the benefits of temperature inversion. The key is to counterbalance leakage power increase at higher temperatures with dynamic power reduction by the Ti-states. The projected optimal temperature is typically around 60°C and yields 8% to 9% chip power saving. The optimal high-temperature can be exploited to reduce design cost and runtime operating power for overall cooling. Our findings are important for power and thermal management in future chips and process technologies.

Keywords-timing margin; temperature inversion; power management; reliability; technology scaling

Paper
2015
Y. Zu, C. R. Lefurgy, J. Leng, M. Halpern, M. S. Floyd, and V. J. Reddi, “Adaptive Guardband Scheduling to Improve System-Level Efficiency of the Power7+,” in MICRO-48: The 48th Annual IEEE/ACM International Symposium of Microarchitecture, 2015, pp. 308–321. Publisher's VersionAbstract

The traditional guardbanding approach to ensure processor reliability is becoming obsolete because it always over-provisions voltage and wastes a lot of energy. As a next-generation alternative, adaptive guardbanding dynamically adjusts chip clock frequency and voltage based on timing margin measured at runtime. With adaptive guardbanding, voltage guardband is only provided when needed, thereby promising significant energy eciency improvement. In this paper, we provide the first full-system analysis of adaptive guardbanding’s implications using a POWER7+ multicore. On the basis of a broad collection of hardware measurements, we show the benefits of adaptive guardbanding in a practical setting are strongly dependent upon workload characteristics and chip-wide multicore activity. A key finding is that adaptive guardbanding’s benefits diminish as the number of active cores increases, and they are highly dependent upon the workload running. Through a series of analysis, we show these high-level system e↵ects are the result of interactions between the application characteristics, architecture and the underlying voltage regulator module’s loadline e↵ect and IR drop e↵ects. To that end, we introduce adaptive guardband scheduling to reclaim adaptive guardbanding’s e- ciency under di↵erent enterprise scenarios. Our solution reduces processor power consumption by 6.2% over a highly optimized system, e↵ectively doubling adaptive guardbanding’s original improvement. Our solution also avoids malicious workload mappings to guarantee application QoS in the face of adaptive guardbanding hardware’s variable performance.

PDF