The performance gap between the CPU and memory has been widened after decades of advance in technologies. Memory operations have become more and more expensive compared to the logical and arithmetical operations. This dissertation addresses two compiler techniques related to memory optimizations: memory disambiguation profiling and local memory management.
The static memory disambiguation analyses in a compiler, such as alias analysis and dependence analysis, are often limited by the lack of runtime information and conservative nature of compiler analysis. Thus, many optimization opportunities may be lost due to the imprecise or overly conservative analysis result. The dissertation proposed a new approach, which aimed to produce the memory disambiguation information from profiling. Two profiling methods, alias profiling and dependence profiling, are proposed. Special hash method is designed to make the profiling efficient. Software-based sampling method is used to further reduce the overhead. Studies on the impact of granularity of memory checking, path sensitivity and context sensitivity are conducted with this profiling tool. The overhead can be reduced to only 30% of the total execution time in this expensive pure software profiling. A speculative partial redundancy elimination optimization based on the profiling result and the special hardware, ALAT, in Itanium processor, is also presented. This optimization can cause up to a 10% improvement to Spec2000 benchmarks, which demonstrates the effectiveness of the profiling methods.
In some of the multi-core systems, some local memory is attached to the core for fast access, but without cache coherence support. This dissertation proposes several methods to manage the local memory automatically by compiler with runtime library. There are two common methods, software controlled cache and direct buffering, commonly used to manage the local memory. In this dissertation, an analytic model for a compiler to decide the number and size of the buffer that should be used to optimally overlap the data transfer and computation is presented. How to integrate the two methods is also discussed in the dissertation. Novel data flow analysis and runtime checking schemes are designed for the integration. A data prefetching method for software cache is also presented. All
these new methods are implemented in IBM's compiler for Cell and have proven to be effective and efficient in local memory management.