Improving Processor Performance Through Compiler-Assisted Block Reuse
2000-04-24
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Improving Processor Performance Through Compiler-Assisted Block Reuse
Alternative title
Authors
Published Date
2000-04-24
Publisher
Type
Report
Abstract
Superscalar microprocessors currently power the majority of computing machines. These processors are capable of executing multiple independent instructions in each clock cycle by exploiting the Instruction-Level Parallelism (ILP) available in programs. Theoretically, there is a considerable amount of ILP available in most programs. However, the actual amount of exploitable ILP within a fixed instruction window with preset hardware resources is typically quite limited. On the other hand, researchers have observed that the values produced by the execution of instructions exhibit considerable value locality, that is, the repetitive execution of a single instruction often produces repetitive values. Hence, while its performance is limited by the exploitable ILP, the processor is doing redundant work. A natural solution to this problem would be to remove as much redundant work as possible. Value prediction and value reuse are two of the promising approaches addressing this issue.Value prediction does not actually remove the redundant work. Instead, it improves the available ILP by allowing dependent instructions to be executed speculatively after predicting the values of their operands. Value reuse, on the other hand, tries to remove the redundancy by buffering the previously produced results of instructions and skipping the execution of redundant instructions. This thesis focuses on value reuse schemes. Previous value reuse mechanisms use only a single instruction as the reuse unit, i.e., only one instruction is skipped for each reuse-detection process. This research, however, shows that value reuse at larger granularities than a single instruction could potentially further improve the performance of superscalar processors by skipping the execution of a number of instructions for each reuse-detection process. Basic-block reuse, sub-block reuse, trace reuse and function reuse schemes are studied in detail. These schemes cover the full spectrum of value reuse granularity from a single instruction to an entire function.Simulation results show that block reuse with compiler assistance has substantial potential to improve the performance of superscalar processors. In particular, it is shown that a block of instructions, such as a basic block or a sub-block, behaves like a super-instruction that exhibits a substantial amount of value locality. Basic blocks and sub-blocks provide a convenient link between the processor hardware and the compiler, allowing the compiler to influence and to help improve the performance of block reuse with only a reasonable amount of hardware. Finally, it is shown that block reuse outperforms reuse mechanisms based on other reuse units.This thesis makes three primary contributions. Firstly, ...[NOTE - Abstract continues in actual report]
Keywords
Description
Related to
Replaces
License
Series/Report Number
Technical Report; 00-029
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Huang, Jian. (2000). Improving Processor Performance Through Compiler-Assisted Block Reuse. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215416.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.