Achieving high computational performance on large-scale high performance computing (HPC) system demands optimizations to exploit hardware characteristics. Various optimizations and research strategies are implemented to improve performance with emphasis on single or multiple hardware characteristics. Among these approaches, the domain-specific approach involving domain expertise shows its high potential in achieving high performance and maintaining performance portability. Deep memory hierarchies, single instruction multiple data (SIMD) engines, and multiple processing cores in the latest CPUs pose many challenges to programmers seeking significant fractions of peak performance. Programming for high performance computation using modern CPUs has to address thread-level parallelization on multiple cores, data-level parallelization on SIMD engines, and optimizing memory utilization for the multi-level memories. Using multiple computational nodes with multiple CPUs in each node to scale up the computation without sacrificing performance increases programming burden significantly. As a result, performance portability has become a major challenge to programmers. It is well known that manually tuned programs can assist the compiler to deliver the best performance. However, generating these optimized codes requires deep understanding in application design, hardware architecture, compiler optimizations, and knowledge in the specific domain. Such manually tuning process has to be done for each new hardware design. To address this issue, this dissertation proposes strategies that exploit the advantages of domain-specific optimizations to achieve performance portability. This dissertation shows the combination of the proposed strategies can effectively exploit both the SIMD engine and on-chip memory. High fraction of peak performance can be achieved after such optimizations. The design of the pre-compilation framework makes it possible to automate these optimizations. Adopting the latest compiler techniques to assist domain-specific optimizations has high potential to implement sophisticated and legal transformations. This dissertation provides a preliminary study using polyhedral transformations to implement the proposed optimization strategies. Several obstacles need to be removed to make this technique applicable to large-scale scientific applications. With the research presented in this dissertation and suggested tasks in the future work, the ultimate goal to deliver performance portability with automation is feasible for CFD applications.
University of Minnesota Ph.D. dissertation. June 2013. Major: Computer Science. Advisors: Professor Pen-Chung Yew, Paul R. Woodward. 1 computer file (PDF); x, 147 pages, appendix A.
Performance portability strategies for Computational Fluid Dynamics (CFD) applications on HPC systems.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.