Publications by Year: 2014

J. Leng, Y. Zu, and V. J. Reddi, “Energy Efficiency Benefits of Reducing the Voltage Guardband on the Kepler Gpu Architecture,” Proc. of Silicon Errors in Logic – System Effects (SELSE), 2014.Abstract

Energy efficiency of GPU architectures has emerged as an important design criterion for both NVIDIA and AMD. In this paper, we explore the benefits of scaling a generalpurpose GPU (GPGPU) core’s supply voltage to the near limits of execution failure. We find that as much as 21% of NVIDIA GTX 680’s core supply voltage guardband can be eliminated to achieve significant energy efficiency improvement. Measured results indicate that the energy improvements can be as high as 25% without any performance loss. The challenge, however, is to understand what impacts the minimum voltage guardband and how the guardband can be scaled without compromising correctness. We show that GPU microarchitectural activity patterns caused by different program characteristics are the root cause(s) of the large voltage guardband. We also demonstrate how microarchitecture-level parameters, such as clock frequency and the number of cores, impact the guardband. We hope our preliminary analysis lays the groundwork for future research.

C. Zhou, X. Wang, W. Xu, Y. Zhu, V. J. Reddi, and C. H. Kim, “Estimation of Instantaneous Frequency Fluctuation in a Fast DVFS Environment Using an Empirical BTI Stress-Relaxation Model,” in Proceedings of the International Reliability Physics Symposium (IRPS), 2014, pp. 2D–2. Publisher's VersionAbstract

This work proposes an empirical Bias Temperature Instability (BTI) stress-relaxation model based on the superposition property. The model was used to study the instantaneous frequency fluctuation in a fast Dynamic Voltage and Frequency Scaling (DVFS) environment. VDD and operating frequency information for this study were collected from an ARM Cortex A15 processor based development board running an Android operating system. Simulation results show that the frequency peaks and dips are functions of mainly two parameters: (1) the amount of stress or recovery experienced by the circuit prior to the VDD switching and (2) the frequency sensitivity to device aging after the VDD switching.

Paper Presentation
Y. Zhu, A. Srikanth, J. Leng, and V. J. Reddi, “Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing,” Computer Architecture Letters (CAL), vol. 13, no. 1, pp. 33–36, 2014. Publisher's VersionAbstract

Web browsing on mobile devices is undoubtedly the future. However, with the increasing complexity of webpages, the mobile device’s computation capability and energy consumption become major pitfalls for a satisfactory user experience. In this paper, we propose a mechanism to effectively leverage processor frequency scaling in order to balance the performance and energy consumption of mobile web browsing. This mechanism explores the performance and energy tradeoff in webpage loading, and schedules webpage loading according to the webpages’ characteristics, using the different frequencies. The proposed solution achieves 20.3% energy saving compared to the performance mode, and improves webpage loading performance by 37.1% compared to the battery saving mode.

Index Terms—Energy, EDP, Cutoff, Performance, Webpages

Paper Presentation (Best of CAL)
J. Leng, Y. Zu, M. Rhu, M. Gupta, and V. J. Reddi, “GPUVolt: Modeling and Characterizing Voltage Noise in Gpu Architectures,” in Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2014, pp. 141–146.Abstract

Voltage noise is a major obstacle in improving processor energy eciency because it necessitates large operating voltage guardbands that increase overall power consumption and limit peak performance. Identifying the leading root causes of voltage noise is essential to minimize the unnecessary guardband and maximize the overall energy eciency. We provide the first-ever modeling and characterization of voltage noise in GPUs based on a new simulation infrastructure called GPUVolt. Using it, we identify the key intracore microarchitectural components (e.g., the register file and special functional units) that significantly impact the GPU’s voltage noise. We also demonstrate that intercore-aligned microarchitectural activity detrimentally impacts the chipwide worst-case voltage droops. On the basis of these findings, we propose a combined register-file and execution-unit throttling mechanism that smooths GPU voltage noise and reduces the guardband requirement by as much as 29%.

Categories and Subject Descriptors

C.4 [Performance of Systems]: Modeling techniques, Reliability, availability, and serviceability


di/dt, inductive noise, GPU architecture, GPU reliability

S. Chai, D. Zhang, J. Leng, and V. J. Reddi, “Lightweight Detection and Recovery Mechanisms to Extend Algorithm Resiliency in Noisy Computation,” Workshop on Near-threshold Computing (WNTC). 2014.Abstract

— The intrinsic robustness of an algorithm and architecture depends highly on the combined ability tolerate noise. In this paper, we present an alternative approach for energy reduction for near threshold computing based on a statistical modeling of computational noise induced from noisy memory and non-ideal interconnects. We present this approach as a complement to the standard approximate computing approaches. We show results of the lightweight error checks and recovery based on several design considerations on data value speculation.

Index Terms—Approximate computing, noise resiliency, computation noise, near threshold computing

M. Kazdagli, L. Huang, V. REDDI, and M. Tiwari, “Morpheus: Benchmarking Computational Diversity in Mobile Malware,” Workshop on Hardware and Architectural Support for Security and Privacy (HASP). ACM, 2014.Abstract

Computational characteristics of a program can potentially be used to identify malicious programs from benign ones. However, systematically evaluating malware detection techniques, especially when malware samples are hard to run correctly and can adapt their computational characteristics, is a hard problem. We introduce Morpheus – a benchmarking tool that includes both real mobile malware and a synthetic malware generator that can be configured to generate a computationally diverse malware sample-set – as a tool to evaluate computational signatures based malware detection. Morpheus also includes a set of computationally diverse benign applications that can be used to repackage malware into, along with a recorded trace of over 1 hour long realistic human usage for each app that can be used to replay both benign and malicious executions.

The current Morpheus prototype targets Android applications and malware samples. Using Morpheus, we quantify the computational diversity in malware behavior and expose opportunities for dynamic analyses that can detect mobile malware. Specifically, the use of obfuscation and encryption to thwart static analyses causes the malicious execution to be more distinctive – a potential opportunity for detection. We also present potential challenges, specifically, minimizing false positives that can arise due to diversity of benign executions.

Categories and Subject Descriptors

D.4.6 [Security and Protection]: Invasive software


security, mobile malware, performance counters

Y. Zhu and V. J. Reddi, “WebCore: Architectural Support for Mobile Web Browsing,” Proceedings of the 41st International Symposium on Computer Architecture (ISCA), vol. 42, no. 3, pp. 541–552, 2014. Publisher's VersionAbstract

The Web browser is undoubtedly the single most important application in the mobile ecosystem. An average user spends 72 minutes each day using the mobile Web browser. Web browser internal engines (e.g., WebKit) are also growing in importance because they provide a common substrate for developing various mobile Web applications. In a user-driven, interactive, and latency-sensitive environment, the browser’s performance is crucial. However, the battery-constrained nature of mobile devices limits the performance that we can deliver for mobile Web browsing. As traditional general-purpose techniques to improve performance and energy efficiency fall short, we must employ domain-specific knowledge while still maintaining general-purpose flexibility.

In this paper, we first perform design-space exploration to identify appropriate general-purpose architectures that uniquely fit the characteristics of a popular Web browsing engine. Despite our best effort, we discover sources of energy inefficiency in these customized general-purpose architectures. To mitigate these inefficiencies, we propose, synthesize, and evaluate two new domain-specific specializations, called the Style Resolution Unit and the Browser Engine Cache. Our optimizations boost energy efficiency and at the same time improve mobile Web browsing performance. As emerging mobile workloads increasingly rely more on Web browser technologies, the type of optimizations we propose will become important in the future and are likely to have lasting widespread impact.