The last decade has seen the transition from unicore processors to their multi-core (and now many-core) counterparts. This transition has brought about renewed focus on compiler developers to extract performance from these parallel processors. In addition to extracting parallelism, another important responsibility of a parallelizing (or optimizing) compiler is to improve the memory system performance of the source program. This is particularly important because the multi-cores have accentuated the memory-wall and the bandwidth-wall.In this thesis, we identify three key challenges facing the compiler developers on current processors. These include,(1) the diverse set of microarchitectures existent at any time, and more importantly, the changes in micrarchitecture between generations. (2) Poor show of compilers in real applications that contain large scope of statements amenable for optimization. (3) Unscalability of compilers - this is a traditional limitation of compilers where the compilers choose to optimize small scopes to contain the compile time and memory requirement, and thus loose optimization opportunities.In this thesis, we make the following contributions to address the above challenges.(1) We revisit three compiler optimizations (loop tiling and loop fusion for enhancing temporal locality and data prefetching for hiding memory latency) for improving memory (and parallel) performance in light of the various recent advances in microarchitecture, including deeper memory hierarchy, the multithreading technology, the (short-vector) SIMDization technology, and hardware prefetching, and propose generic algorithms implementable in production compilers for a range of processors.(2) We propose wise heuristics in a cost model to choose good statements to fuse, and also improve dependence analysis to not loose critical fusion opportunity in application programs when it exists.(3) The final contribution of this thesis is a solution to the unscalability problem. Based on program semantics, we devise a way to represent the entire program with much fewer representative statements and dependences, leading to significantly improved compile time and memory requirement for compilation. Thus, real applications can now be optimized not only efficiently, but at very low overhead.
University of Minnesota Ph.D. dissertation. September 2014. Major: Computer Science. Advisor: Pen-Chung Yew. 1 computer file (PDF); x, 158 pages.
Scalable compiler optimizations for improving the memory system performance in multi- and many-core processors.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.