Publications by Year: 2013

2013
L. Guckert, M. O’Connor, K. S. Ravindranath, Z. Zhao, and J. V. Reddi, “A Case for Persistent Caching of Compiled Javascript Code in Mobile Web Browsers,” in Workshop on Architectural and Microarchitectural Support for Binary Translation (AMAS-BT), 2013.Abstract

Over the past decade webpages have grown an order of magnitude in computational complexity. Modern webpages provide rich and complex interactive behaviors for differentiated user experiences. Many of these new capabilities are delivered via JavaScript embedded within these webpages. In this work, we evaluate the potential benefits of persistently caching compiled JavaScript code in the Mozilla JavaScript engine within the Firefox browser. We cache compiled byte codes and generated native code across browser sessions to eliminate the redundant compilation work that occurs when webpages are revisited. Current browsers maintain persistent caches of code and images received over the network. Current browsers also maintain inmemory “caches” of recently accessed webpages (WebKit’s Page Cache or Firefox’s “Back-Forward” cache) that do not persist across browser sessions. This paper assesses the performance improvement and power reduction opportunities that arise from caching compiled JavaScript across browser sessions. We show that persistent caching can achieve an average of 91% reduction in compilation time for top webpages and 78% for HTML5 webpages. It also reduces energy consumption by an average of 23% as compared to the baseline.

PDF
J. Leng, et al., “GPUWattch: Enabling Energy Optimizations in GPGPUs,” in ACM SIGARCH Computer Architecture News, 2013, vol. 41, no. 3, pp. 487–498. Publisher's VersionAbstract

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. We propose a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements. To achieve configurability, we use a bottom-up methodology and abstract parameters from the microarchitectural components as the model’s inputs. We developed a rigorous suite of 80 microbenchmarks that we use to bound any modeling uncertainties and inaccuracies. The power model is comprehensively validated against measurements of two commercially available GPUs, and the measured error is within 9.9% and 13.4% for the two target GPUs (GTX 480 and Quadro FX5600). The model also accurately tracks the power consumption trend over time. We integrated the power model with the cycle-level simulator GPGPU-Sim and demonstrate the energy savings by utilizing dynamic voltage and frequency scaling (DVFS) and clock gating. Traditional DVFS reduces GPU energy consumption by 14.4% by leveraging within-kernel runtime variations. More finer-grained SM cluster-level DVFS improves the energy savings from 6.6% to 13.6% for those benchmarks that show clustered execution behavior. We also show that clock gating inactive lanes during divergence reduces dynamic power by 11.2%.

Categories and Subject Descriptors

C.1.4 [Processor Architectures]: Parallel Architectures; C.4 [Performance of Systems]: Modeling techniques

General Terms

Experimentation, Measurement, Power, Performance

Keywords

Energy, CUDA, GPU architecture, Power estimation

Paper
Y. Zhu and V. J. Reddi, “High-Performance and Energy-Efficient Mobile Web Browsing on Big/Little Systems,” in High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on, 2013, pp. 13–24. Publisher's VersionAbstract

Internet web browsing has reached a critical tipping point. Increasingly, users rely more on mobile web browsers to access the Internet than desktop browsers. Meanwhile, webpages over the past decade have grown in complexity by more than tenfold. The fast penetration of mobile browsing and everricher webpages implies a growing need for high-performance mobile devices in the future to ensure continued end-user browsing experience. Failing to deliver webpages meeting hard cut-off constraints could directly translate to webpage abandonment or, for e-commerce websites, great revenue loss. However, mobile devices’ limited battery capacity limits the degree of performance that mobile web browsing can achieve. In this paper, we demonstrate the benefits of heterogeneous systems with big/little cores each with different frequencies to achieve the ideal trade-off between high performance and energy efficiency. Through detailed characterizations of different webpage primitives based on the hottest 5,000 webpages, we build statistical inference models that estimate webpage load time and energy consumption. We show that leveraging such predictive models lets us identify and schedule webpages using the ideal core and frequency configuration that minimizes energy consumption while still meeting stringent cut-off constraints. Real hardware and software evaluations show that our scheduling scheme achieves 83.0% energy savings, while only violating the cut-off latency for 4.1% more webpages as compared with a performance-oriented hardware strategy. Against a more intelligent, OS-driven, dynamic voltage and frequency scaling scheme, it achieves 8.6% energy savings and 4.0% performance improvement simultaneously.

Paper
S. Kanev, T. M. Jones, G. - Y. Wei, D. M. Brooks, and V. J. Reddi, “Measuring Code Optimization Impact on Voltage Noise,” Workshop on Silicon Errors in Logic - System Effects (SELSE). 2013.Abstract

In this paper, we characterize the impact of compiler optimizations on voltage noise. While intuition may suggest that the better processor utilization ensured by optimizing compilers results in a small amount of voltage variation, our measurements on a IntelR CoreTM2 Duo processor show the opposite – the majority of SPEC 2006 benchmarks exhibit more voltage droops when aggressively optimized. We show that this increase in noise could be sufficient for a net performance decrease in a typicalcase, resilient design.

Paper
V. J. Reddi, “Reliability-Aware Microarchitecture Design,” IEEE Micro, no. 4, pp. 4–5, 2013. Publisher's Version
V. J. Reddi and M. S. Gupta, Resilient Architecture Design for Voltage Variation, vol. 8, no. 2. Morgan & Claypool Publishers, 2013, pp. 1–138. Publisher's VersionAbstract

Shrinking feature size and diminishing supply voltage are making circuits sensitive to supply voltage fluctuations within the microprocessor, caused by normal workload activity changes. If left unattended,voltage fluctuations can lead to timing violations or even transistor lifetime issues that degrade processor robustness. Mechanisms that learn to tolerate, avoid, and eliminate voltage fluctuations based on program and microarchitectural events can help steer the processor clear of danger, thus enabling tighter voltage margins that improve performance or lower power consumption.We describe the problem of voltage variation and the factors that influence this variation during processor design and operation. We also describe a variety of runtime hardware and software mitigation techniques that either tolerate, avoid, and/or eliminate voltage violations.We hope processor architects will find the information useful since tolerance, avoidance, and elimination are generalizable constructs that can serve as a basis for addressing other reliability challenges as well.

KEYWORDS

voltage noise, voltage smoothing, di dt , inductive noise, voltage emergencies, error detection, error correction, error recovery, transient errors, power supply noise, power delivery networks

Paper