Heterogeneous multicore systems have become prevalent and provided feasibility for balancing single threaded performance and high throughput requirements. Integrating multiple energy-efficient GPU accelerator cores with a traditional superscalar CPU cores onto the same die emerged as way to achieve the desired performance goal within a stringent power budget. As future heterogeneous systems scale both in number of cores and variety of computation resources, the on-chip interconnection networks (NoCs) must continuously support high-throughput low-latency on-chip data communication. As an integral part of a heterogeneous multicore system, the NoC must be able to perform on-chip data movement in an efficient manner. Meanwhile, future heterogeneous systems allow different types of compute units to have a unified address space, therefore, optimizing data sharing is crucial for improving system performance. Since the NoC plays an vital role in supporting data sharing, it must be designed correspondingly to improve the overall performance. This dissertation addresses the above two challenges, i.e., optimizing data movement and data sharing, in designing NoCs for heterogeneous multicore systems. Enabling efficient data movement in terms of both performance and energy is critical in heterogeneous multicore systems, in which multiple applications are running simultaneously. In particular, NoCs must be designed to satisfy the communication requirements for both latency-sensitive CPU traffic and throughput-intensive GPU traffic. Traditional packet-switched NoCs, which have the flexibility of connecting diverse computation and storage devices, are facing great challenges to meet the performance requirements within the energy budget due to latency and energy consumption associated with buffering and routing at each router. In the first part of this dissertation, we take advantage of the diversity in performance requirements of on-chip heterogeneous computing devices by designing, implementing, and evaluating a hybrid-switched NoC that allows the packet-switched and circuit-switched messages to share the same communication fabric by partitioning the network bandwidth through time-division multiplexing. The second part of the dissertation focuses on maintaining global memory access order using the proposed hybrid-switched NoC, allowing memory operations to perform in parallel while a stronger memory consistency model can still be satisfied. The memory consistency model specifies the order how reads and writes from one thread are visible to other threads. Choosing of memory consistency models can largely impact performance, programmability, as well as hardware implementation. Enforcing a programmer-friendly strong memory consistency model while maximizing memory-level parallelism is challenging, especially in heterogeneous systems where data-parallel cores generate significant amount of outstanding memory requests. End-point ordering at cores is expensive since it prohibits a number of architecture optimizations. However, if correct order is provided in the interconnection network during request transmission, we can potentially improve the performance by parallelizing memory requests. The circuit-switched data path in a hybrid-switched NoC, guarantees message transmission ordering, therefore can be used as an infrastructure to preserve program order. Based on this observation, this dissertation proposes a hybrid-switched NoC attached with a light-weight token ring network to guarantee global memory access order.
University of Minnesota Ph.D. dissertation. November 2015. Major: Computer Science. Advisor: Antonia Zhai. 1 computer file (PDF); x, 104 pages.
Time-Division-Multiplexing Based Hybrid-Switched NoC for Heterogeneous Multicore Systems.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.