Improving Processor Performance Through Compiler-Assisted Block Reuse

2000-04-24
Loading...
Thumbnail Image

View/Download File

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Improving Processor Performance Through Compiler-Assisted Block Reuse

Published Date

2000-04-24

Publisher

Type

Report

Abstract

Superscalar microprocessors currently power the majority of computing machines. These processors are capable of executing multiple independent instructions in each clock cycle by exploiting the Instruction-Level Parallelism (ILP) available in programs. Theoretically, there is a considerable amount of ILP available in most programs. However, the actual amount of exploitable ILP within a fixed instruction window with preset hardware resources is typically quite limited. On the other hand, researchers have observed that the values produced by the execution of instructions exhibit considerable value locality, that is, the repetitive execution of a single instruction often produces repetitive values. Hence, while its performance is limited by the exploitable ILP, the processor is doing redundant work. A natural solution to this problem would be to remove as much redundant work as possible. Value prediction and value reuse are two of the promising approaches addressing this issue.Value prediction does not actually remove the redundant work. Instead, it improves the available ILP by allowing dependent instructions to be executed speculatively after predicting the values of their operands. Value reuse, on the other hand, tries to remove the redundancy by buffering the previously produced results of instructions and skipping the execution of redundant instructions. This thesis focuses on value reuse schemes. Previous value reuse mechanisms use only a single instruction as the reuse unit, i.e., only one instruction is skipped for each reuse-detection process. This research, however, shows that value reuse at larger granularities than a single instruction could potentially further improve the performance of superscalar processors by skipping the execution of a number of instructions for each reuse-detection process. Basic-block reuse, sub-block reuse, trace reuse and function reuse schemes are studied in detail. These schemes cover the full spectrum of value reuse granularity from a single instruction to an entire function.Simulation results show that block reuse with compiler assistance has substantial potential to improve the performance of superscalar processors. In particular, it is shown that a block of instructions, such as a basic block or a sub-block, behaves like a super-instruction that exhibits a substantial amount of value locality. Basic blocks and sub-blocks provide a convenient link between the processor hardware and the compiler, allowing the compiler to influence and to help improve the performance of block reuse with only a reasonable amount of hardware. Finally, it is shown that block reuse outperforms reuse mechanisms based on other reuse units.This thesis makes three primary contributions. Firstly, ...[NOTE - Abstract continues in actual report]

Keywords

Description

Related to

Replaces

License

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Huang, Jian. (2000). Improving Processor Performance Through Compiler-Assisted Block Reuse. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215416.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.