In the era of nanoscale technology scaling, we are facing the limits of physics, challenging robust and reliable microprocessor design and fabrication. As these trends continue, guaranteeing correctness of execution is becoming prohibitively expensive and impractical. In this thesis, we demonstrate the benefits of abstracting circuit-level challenges to the architecture and software layers. Reliability challenges are broadly classified into process, voltage, and thermal variations. As proof of concept, we target voltage variation, which is least understood, demonstrating its growing detrimental effects on future processors: Shrinking feature size and diminishing supply voltage are making circuits more sensitive to supply voltage fluctuations within the microprocessor. If left unattended, these voltage fluctuations can lead to timing violations or even transistor lifetime issues. This problem, more commonly known as the dI/dt problem, is forcing microprocessor designers to increasingly sacrifice processor performance, as well as power efficiency, in order to guarantee correctness and robustness of operation. Industry addresses this problem by un-optimizing the processor for the worst case voltage flux. Setting such extreme operating voltage margins for those large and infrequent voltage swings is not a sustainable solution in the long term. Therefore, we depart from this traditional strategy and operate the processor under more typical case conditions. We demonstrate that a collaborative architecture between hardware and software enables aggressive operating voltage margins, and as a consequence improves processor performance and power efficiency. This co-designed architecture is built on the principles of tolerance, avoidance and elimination. Using a fail-safe hardware mechanism to tolerate voltage margin violations, we enable timing speculation, while a run-time hardware and software layer attempts to not only predict and avoid impending violations, but also reschedules instructions and co-schedules threads intelligently to eliminate voltage violations altogether. We believe tolerance, avoidance and elimination are generalizable constructs capable of acting as guidelines to address and successfully mitigate the other parameter-related reliability challenges as well.
Dynamic code transformation systems are steadily gaining acceptance in computing environments for services such as program optimization, translation, instrumentation and security. Code transformation systems are required to perform complex and time consuming tasks such as costly program analysis and apply transformations (i.e. instrumentation, translation etc.) As these steps are applied to all code regions (regardless of characteristics), the transformation overhead can be significant. Once transformed, the remaining overhead is determined by the performance of the translated code. Current code transformation systems can only become part of mainstream computing only if these overheads are eliminated. Nevertheless, certain application and computing environments exist in which code transformation systems can be effectively deployed. This thesis identifies two such environments, persistence and mixed execution. Persistence leverages previous execution characteristics to address the transformation overhead. This is accomplished by capturing the translated executions at the end of their first invocation. The captured executions are cached on disk for re-use. All subsequent invocations of the run-time system using the same application cause the system to reuse the cached executions. Since applications exhibit similar behavior across varying input data sets, this execution model successfully diminishes the transformation overhead across multiple invocations. Persistence in the domain of dynamic binary instrumentation is highlighted as an example. Mixed execution accepts that the performance of the code generated by today’s code transformation systems is in no position to compete with original execution times. Therefore, this technique proposes executing a mix of the original and translated code sequences to keep the translated code performance penalties within bounds. This execution model is a more effective alternative to pure Just-in-Time compiler-based code transformation systems, when low overheads and minimal architectural perturbation are the critical constraints required to be met. A dynamic compilation framework for controlling microprocessor energy and performance using this model is presented in light of its effectiveness and practicality.