Improving Processor Performance Through Compiler-Assisted Block Reuse

Huang, Jian2020-09-022020-09-022000-04-24https://hdl.handle.net/11299/215416Superscalar microprocessors currently power the majority of computing machines. These processors are capable of executing multiple independent instructions in each clock cycle by exploiting the Instruction-Level Parallelism (ILP) available in programs. Theoretically, there is a considerable amount of ILP available in most programs. However, the actual amount of exploitable ILP within a fixed instruction window with preset hardware resources is typically quite limited. On the other hand, researchers have observed that the values produced by the execution of instructions exhibit considerable value locality, that is, the repetitive execution of a single instruction often produces repetitive values. Hence, while its performance is limited by the exploitable ILP, the processor is doing redundant work. A natural solution to this problem would be to remove as much redundant work as possible. Value prediction and value reuse are two of the promising approaches addressing this issue.Value prediction does not actually remove the redundant work. Instead, it improves the available ILP by allowing dependent instructions to be executed speculatively after predicting the values of their operands. Value reuse, on the other hand, tries to remove the redundancy by buffering the previously produced results of instructions and skipping the execution of redundant instructions. This thesis focuses on value reuse schemes. Previous value reuse mechanisms use only a single instruction as the reuse unit, i.e., only one instruction is skipped for each reuse-detection process. This research, however, shows that value reuse at larger granularities than a single instruction could potentially further improve the performance of superscalar processors by skipping the execution of a number of instructions for each reuse-detection process. Basic-block reuse, sub-block reuse, trace reuse and function reuse schemes are studied in detail. These schemes cover the full spectrum of value reuse granularity from a single instruction to an entire function.Simulation results show that block reuse with compiler assistance has substantial potential to improve the performance of superscalar processors. In particular, it is shown that a block of instructions, such as a basic block or a sub-block, behaves like a super-instruction that exhibits a substantial amount of value locality. Basic blocks and sub-blocks provide a convenient link between the processor hardware and the compiler, allowing the compiler to influence and to help improve the performance of block reuse with only a reasonable amount of hardware. Finally, it is shown that block reuse outperforms reuse mechanisms based on other reuse units.This thesis makes three primary contributions. Firstly, ...[NOTE - Abstract continues in actual report]en-USImproving Processor Performance Through Compiler-Assisted Block ReuseReport