Publications by Year: 2015

2015
V. J. Reddi, M. S. Gupta, G. Holloway, G. - Y. Wei, M. D. Smith, and D. Brooks, “Adaptive Event-Guided System and Method for Avoiding Voltage Emergencies”, US Patent: 8,949,666, 2015.
Y. Zu, C. R. Lefurgy, J. Leng, M. Halpern, M. S. Floyd, and V. J. Reddi, “Adaptive Guardband Scheduling to Improve System-Level Efficiency of the Power7+,” in MICRO-48: The 48th Annual IEEE/ACM International Symposium of Microarchitecture, 2015, pp. 308–321. Publisher's VersionAbstract

The traditional guardbanding approach to ensure processor reliability is becoming obsolete because it always over-provisions voltage and wastes a lot of energy. As a next-generation alternative, adaptive guardbanding dynamically adjusts chip clock frequency and voltage based on timing margin measured at runtime. With adaptive guardbanding, voltage guardband is only provided when needed, thereby promising significant energy eciency improvement. In this paper, we provide the first full-system analysis of adaptive guardbanding’s implications using a POWER7+ multicore. On the basis of a broad collection of hardware measurements, we show the benefits of adaptive guardbanding in a practical setting are strongly dependent upon workload characteristics and chip-wide multicore activity. A key finding is that adaptive guardbanding’s benefits diminish as the number of active cores increases, and they are highly dependent upon the workload running. Through a series of analysis, we show these high-level system e↵ects are the result of interactions between the application characteristics, architecture and the underlying voltage regulator module’s loadline e↵ect and IR drop e↵ects. To that end, we introduce adaptive guardband scheduling to reclaim adaptive guardbanding’s e- ciency under di↵erent enterprise scenarios. Our solution reduces processor power consumption by 6.2% over a highly optimized system, e↵ectively doubling adaptive guardbanding’s original improvement. Our solution also avoids malicious workload mappings to guarantee application QoS in the face of adaptive guardbanding hardware’s variable performance.

PDF
Y. Zhu, M. Halpern, and V. J. Reddi, “Event-Based Scheduling for Energy-Efficient QoS (EQoS) in Mobile Web Applications,” in 21st International Symposium on High Performance Computer Architecture (HPCA), 2015, pp. 137–149. Publisher's VersionAbstract

Mobile Web applications have become an integral part of our society. They pose a high demand for application quality of service (QoS). However, the energy-constrained nature of mobile devices makes optimizing for QoS difficult. Prior art on energy efficiency optimizations has only focused on the trade-off between raw performance and energy consumption, ignoring the application QoS characteristics. In this paper, we propose the concept of energy-efficient QoS (eQoS) to capture the trade-off between QoS and energy consumption. Given the fundamental event-driven nature of mobile Web applications, we further propose event-based scheduling as an optimization framework for eQoS. The event-based scheduling automatically reasons about users’ QoS requirements, and accurately slacks the events’ execution time to save energy without violating end users’ experience. We demonstrate a working prototype using the Google Chromium and V8 framework on the Samsung Exynos 5410 SoC (used in the Galaxy S4 smartphone). Based on real hardware and software measurements, we achieve 41.2% energy saving with only 0.4% of QoS violations perceptible to end users.

Paper
J. Leng, Y. Zu, and V. J. Reddi, “Gpu Voltage Noise: Characterization and Hierarchical Smoothing of Spatial and Temporal Voltage Noise Interference in Gpu Architectures,” in 21st International Symposium on High Performance Computer Architecture (HPCA), 2015, pp. 161–173. Publisher's VersionAbstract

Energy efficiency is undoubtedly important for GPU architectures. Besides the traditionally explored energy-efficiency optimization techniques, exploiting the supply voltage guardband remains a promising yet unexplored opportunity. Our hardware measurements show that up to 23% of the nominal supply voltage can be eliminated to improve GPU energy efficiency by as much as 25%. The key obstacle for exploiting this opportunity lies in understanding the characteristics and root causes of large voltage droops in GPU architectures and subsequently smoothing them away without severe performance penalties. The GPU’s manycore nature complicates the voltage noise phenomenon, and its distinctive architecture features from the CPU necessitate a GPU-specific voltage noise analysis. In this paper, we make the following contributions. First, we provide a voltage noise categorization framework to identify, characterize, and understand voltage noise in the manycore GPU architecture. Second, we perform a microarchitecture-level voltage-droop root-cause analysis for the two major droop types we identify, namely the local first-order droop and the global second-order droop. Third, on the basis of our categorization and characterization, we propose a hierarchical voltage smoothing mechanism that mitigates each type of voltage droop. Our evaluation shows it can reduce up to 31% worst-case droop, which translates to 11.8% core-level and 7.8% processor-level energy reduction

Paper
D. Richins, Y. Zhu, M. Halpern, and V. J. Reddi, “Locality Lost: Unlocking the Performance of Event-Driven Servers,” in International Symposium on Microarchitecture, 2015.Abstract

Server-side Web applications are in the midst of a software evolution. Application developers are turning away from the established thread-per-request model, where each request gets a dedicated thread on the server, and toward event-driven programming platforms, which promise improved scalability and CPU utilization [1]. In response, we perform a microarchitectural analysis of these applications in current server processors and identify several serious performance bottlenecks and optimization opportunities for future processor designs.

Paper
Y. Zhu, D. Richins, M. Halpern, and V. J. Reddi, “Microarchitectural Implications of Event-Driven Server-Side Web Applications,” in Proceedings of the 48th International Symposium on Microarchitecture, 2015, pp. 762–774. Publisher's VersionAbstract

Enterprise Web applications are moving towards serverside scripting using managed languages. Within this shifting context, event-driven programming is emerging as a crucial programming model to achieve scalability. In this paper, we study the microarchitectural implications of server-side scripting, JavaScript in particular, from a unique event-driven programming model perspective. Using the Node.js framework, we come to several critical microarchitectural conclusions. First, unlike traditional server-workloads such as CloudSuite and BigDataBench that are based on the conventional threadbased execution model, event-driven applications are heavily single-threaded, and as such they require significant singlethread performance. Second, the single-thread performance is severely limited by the front-end inefficiencies of today’s server processor microarchitecture, ultimately leading to overall execution inefficiencies. The front-end inefficiencies stem from the unique combination of limited intra-event code reuse and large inter-event reuse distance. Third, through a deep understanding of event-specific characteristics, architects can mitigate the front-end inefficiencies of the managed-languagebased event-driven execution via a combination of instruction cache insertion policy and prefetcher.

Paper
M. Halpern, Y. Zhu, R. Peri, and V. J. Reddi, “Mosaic: Cross-Platform User-Interaction Record and Replay for the Fragmented Android Ecosystem,” in Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on, 2015, pp. 215–224. Publisher's VersionAbstract

In contrast to traditional computing systems, such as desktops and servers, that are programmed to perform “compute-bound” and “run-to-completion” tasks, mobile applications are designed for user interactivity. Factoring user interactivity into computer system design and evaluation is important, yet possesses many challenges. In particular, systematically studying interactive mobile applications across the diverse set of mobile devices available today is difficult due to the mobile device fragmentation problem. At the time of writing, there are 18,796 distinct Android mobile devices on the market and will only continue to increase in the future. Differences in screen sizes, resolutions and operating systems impose different interactivity requirements, making it difficult to uniformly study these systems. We present Mosaic, a cross-platform, timing-accurate record and replay tool for Android-based mobile devices. Mosaic overcomes device fragmentation through a novel virtual screen abstraction. User interactions are translated from a physical device into a platform-agnostic intermediate representation before translation to a target system. The intermediate representation is human-readable, which allows Mosaic users to modify previously recorded traces or even synthesize their own user interactive sessions from scratch. We demonstrate that Mosaic allows user interaction traces to be recorded on emulators, smartphones, tablets, and development boards and replayed on other devices. Using Mosaic we were able to replay 45 different Google Play applications across multiple devices, and also show that we can perform cross-platform performance comparisons between two different processors under identical user interactions.

Paper
Y. Zhu, M. Halpern, and V. J. Reddi, “The Role of the Cpu in Energy-Efficient Mobile Web Browsing,” IEEE Micro, vol. 35, no. 1, pp. 26–33, 2015. Publisher's VersionAbstract

THE MOBILE CPU IS STARTING TO NOTICEABLY IMPACT WEB BROWSING PERFORMANCE AND ENERGY CONSUMPTION. ACHIEVING ENERGY-EFFICIENT MOBILE WEB BROWSING REQUIRES CONSIDERING BOTH CPU AND NETWORK CAPABILITIES. RESEARCHERS MUST LEVERAGE INTERACTIONS BETWEEN THE CPU AND NETWORK TO DELIVER HIGH MOBILE WEB PERFORMANCE WHILE MAINTAINING A LOW ENERGY FOOTPRINT. DESIGNING FUTURE HIGH-PERFORMANCE AND ENERGY-EFFICIENT MOBILE WEB CLIENTS IMPLIES LOOKING BEYOND INDIVIDUAL COMPONENTS AND TAKING A FULL SYSTEM PERSPECTIVE.

Paper
J. Leng, A. Buyuktosunoglu, R. Bertran, P. Bose, and V. J. Reddi, “Safe Limits on Voltage Reduction Efficiency in GPUs: A Direct Measurement Approach,” in Microarchitecture (MICRO), 2015 48th Annual IEEE/ACM International Symposium on, 2015, pp. 294–307. Publisher's VersionAbstract

Energy eciency of GPU architectures has emerged as an important aspect of computer system design. In this paper, we explore the energy benefits of reducing the GPU chip’s voltage to the safe limit, i.e. Vmin point. We perform such a study on several commercial o↵- the-shelf GPU cards. We find that there exists about 20% voltage guardband on those GPUs spanning two architectural generations, which, if “eliminated” completely, can result in up to 25% energy savings on one of the studied GPU cards. The exact improvement magnitude depends on the program’s available guardband, because our measurement results unveil a program dependent Vmin behavior across the studied programs. We make fundamental observations about the programdependent Vmin behavior. We experimentally determine that the voltage noise has a larger impact on Vmin compared to the process and temperature variation, and the activities during the kernel execution cause large voltage droops. From these findings, we show how to use a kernel’s microarchitectural performance counters to predict its Vmin value accurately. The average and maximum prediction errors are 0.5% and 3%, respectively. The accurate Vmin prediction opens up new possibilities of a cross-layer dynamic guardbanding scheme for GPUs, in which software predicts and manages the voltage guardband, while the functional correctness is ensured by a hardware safety net mechanism.

Paper