Enhancing GPU programmability and correctness through transactional execution

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Enhancing GPU programmability and correctness through transactional execution

Alternative title

Published Date

2015-01

Publisher

Type

Thesis or Dissertation

Abstract

Graphics Processing Units (GPUs) are becoming increasingly popular not only across various scientific communities, but also as integrated data-parallel accelerators on existing multicore processors. Support for massive fine-grained parallelism in contemporary GPUs provides a tremendous amount of computing power. GPUs support thousands of lightweight threads to deliver high computational throughput. Popularity of GPUs is facilitated by easy-to-adopt programming models such as CUDA and OpenCL that aim to ease programmers' efforts while developing parallel GPU applications. However, designing and implementing correct and efficient GPU programs is still challenging since programmers must consider interaction between thousands of parallel threads. Therefore, addressing these challenges is essential for improving programmers' productivity as well as software reliability. Towards this end, this dissertation proposes mechanisms for improving programmability of irregular applications and ensuring correctness of compute kernels.Some applications possess abundant data-level parallelism, but are unable to take advantage of GPU's parallelism. They exhibit irregular memory access patterns to the shared data structures. Programming such applications on GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity. Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads. On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts. To overcome these challenges, we propose transactional memory (TM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts. Transactional execution can incur runtime overheads due to activities such as detecting conflicts across thousands of GPU threads and managing a consistent memory state. Thus, in this dissertation we illustrate lightweight TM designs that are capable of scaling to a large number of GPU threads. In our system, programmers simply mark the critical sections in the applications, and the underlying TM support is able to achieve performance comparable to fine-grained locking.Ensuring functional correctness on GPUs that are capable of supporting thousands of concurrent threads is crucial for achieving high performance. However, GPUs provide relatively little guarantee with respect to the coherence and consistency of the memory system. Thus, they are prone to a multitude of concurrency bugs related to inconsistent memory states. Many such bugs manifest as some form of data race condition at runtime. It is critical to identify such race conditions, and mechanisms that aid their detection at runtime can form the basis for powerful tools for enhancing GPU software correctness. However, relatively little attention has been given to explore such runtime monitors. Most prior works focus on the software-based approaches that incur significant overhead. We believe that minimal hardware support can enable efficient data race detection for GPUs. In this dissertation, we propose a hardware-accelerated data race detection mechanism for efficient and accurate data race detection in GPUs. Our evaluation shows that the proposed mechanism can accurately detect data race bugs in GPU programs with moderate runtime overheads.

Description

University of Minnesota Ph.D. dissertation. January 2015. Major: Computer Science. Advisor: Antonia Zhai. 1 computer file (PDF); xi, 122 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Holey, Anup Purushottam. (2015). Enhancing GPU programmability and correctness through transactional execution. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/171130.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.