We are witnessing a tremendous amount of change in the design of the modern microprocessor. With dozens of CPU cores on-chip recent multicore processors, the search for thread-level parallelism (TLP) is more significant than ever. In parallel, a very different processor architecture has emerged that aims to extract parallelism at an entirely different scale. Originally proposed for accelerating graphical applications, graphics processing units (GPU) are increasingly being employed to improve the performance of general purpose applications.Advances in process technology and the need for energy efficiency has brought together CPU and GPU cores onto the same die to form on-chip heterogeneous multicore processors. Several industrial designs that follow this philosophy are already part of mainstream computing. The presence of diverse cores on the same die, sharing on-chip resources, presents several challenges in achieving an efficient design. In particular, this thesis addresses two key aspects in designing efficient heterogeneous multicore processors: performance and correctness.Performance is of paramount concern in the design of a microprocessor, and the last-level cache (LLC) is a critical on-chip component from this perspective. Several techniques have been proposed to efficiently share the LLC among on-chip cores. However, when the on-chip cores show significant diversity in their memory access characteristics, currently proposed techniques face severe challenge in attaining effective LLC sharing. In the first part of this thesis, we address this problem and propose a new policy that improves the management of shared LLC, in the presence of heterogeneous workloads, in terms of performance as well as energy efficiency.Execution correctness is an important concern in the quest for the extraction of parallelism. Concurrency bugs, such as data race conditions, are severe impediments to the effectiveness of parallel computing. Although, several techniques have been proposed to identify and rectify data race conditions, their implementation faces several challenges. While software-based mechanisms are cheaper to implement, they inflict severe performance overhead on the monitored application. The high performance of hardware-based mechanisms, on the other hand, comes at the expense of additional hardware support and increased implementation cost. In the second part of this thesis, we propose a technique to utilize available on-chip GPU cores to perform efficient data race detection for the applications executing on the CPU cores.Overall, with these two techniques, we address two critical challenges in the design of emerging heterogeneous multicore processors.
University of Minnesota Ph.D. dissertation. december 2013. Major: Computer science. Advisor: Antonia Zhai. 1 computer file (PDF); x, 108 pages.
Performance-correctness challenges in emerging heterogeneous multicore processors.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.