Akturk, Ismail2017-10-092017-10-092017-07https://hdl.handle.net/11299/190478University of Minnesota Ph.D. dissertation. July 2017. Major: Electrical/Computer Engineering. Advisor: Ulya Karpuzcu. 1 computer file (PDF); viii, 99 pages.There are two fundamental challenges for modern computer system design. The first one is accommodating the increasing demand for performance in a tight power budget. The second one is ensuring correct progress despite the increasing possibility of faults that may occur in the system. To address the first challenge, it is essential to track where the power goes. The energy consumption of data orchestration (i.e., storage, movement, communication) dominates the energy consumption of actual data production, i.e., computation. Oftentimes, recomputing data becomes more energy efficient than storing and retrieving pre-computed data by minimizing the prevalent power and performance overhead of data storage, retrieval, and communication. At the same time, recomputation can reduce the demand for communication bandwidth and shrink the memory footprint. In the first half of the dissertation, the potential of data recomputation in improving energy efficiency is quantified and a practical recomputation framework is introduced to trade computation for communication. To address the second challenge, it is needed to provide scalable checkpointing and recovery mechanisms. The traditional method to recover from a fault is to periodically checkpoint the state of the machine. Periodic checkpointing of the machine state makes rollback and restart of execution from a safe state possible upon detection of a fault. The energy overhead of checkpointing, however, as incurred by storage and communication of the machine state grows with the frequency of checkpointing. Amortizing this overhead becomes especially challenging, considering the growth of expected error rates as an artifact of contemporary technology scaling. Recomputation of data (which otherwise would be read from a checkpoint) can reduce both the frequency of checkpointing, the size of the checkpoints and thereby mitigate checkpointing overhead. In the second half, quantitative characterization of recomputation-enabled checkpointing (based on recomputation framework) is provided.enCheckpointingEnergy EfficiencyRecomputationArchitectural Exploration of Data Recomputation for Improving Energy EfficiencyThesis or Dissertation