Towards Hardware-Software Co-design for Energy-Efficient Deep Learning

Artificial intelligence (AI) has become an increasingly important and prevalent technology in today’s world. The past decade has seen tremendous growth in AI with it being used in a wide range of applications, including healthcare, finance, transportation, research, manufacturing, and even entertainment. One of the most significant advancements in AI has been the development of deep neural networks (DNNs), which have revolutionized the field by providing unprecedented human-like performance in solving many real-world problems. However, the computations involved in DNNs are expensive and time-consuming, especially for large and complex networks. Additionally, a variety of models, like convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers, and graph neural networks (GNNs), pose significant challenges for hardware design, particularly due to the diverse set of operations used. Each operation brings its own set of challenges for energy, performance, and memory that do not always align with one another precluding a one size fits all solution. The thesis addresses the above challenges in three parts. The first part tries to develop a fundamental understanding of the different operations involved in different DNN models. This thesis explores the evolution of brain-inspired computing models from a historical context, focusing on DNNs, CNNs, RNNs, and GNNs among others. This provides the necessary context for optimizing DNN operations for training and inference. The second part of the thesis proposed hardware-software co-design techniques inspired by the design of DSP systems to address energy, computation, and memory challenges during training for CNNs. The thesis proposes a novel approach for using systolic architectures to train convolutional neural networks using gradient interleaving, called InterGrad. The approach involves interleaving the computations of two gradients on the same configurable systolic array, resulting in significant savings in terms of the number of cycles and memory accesses. The proposed method uses 25% fewer cycles and memory accesses, and 16% less energy in state-of-the-art CNNs, and up to 2.2× fewer cycles and memory accesses in the fully connected layers. The thesis also presents a novel optimization approach called LayerPipe, which explores how to partition optimally and pipeline DNN training workload on multi-processor systems. LayerPipe can better balance workloads while minimizing the communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors when compared to prior approaches such as PipeDream. Lastly, the thesis explores the design of dedicated hardware accelerators for graph neural networks (GNNs). The proposed SCV-GNN method uses a novel sparse compressed vectors (SCV) format optimized for the aggregation operation. The proposed method achieves a geometric mean speedup of 7.96× and 7.04× over a compressed sparse column (CSC) and compressed sparse rows (CSR) aggregation operations, respectively, and reduces the memory traffic by a factor of 3.29× and 4.37× over CSC and CSR, respectively.

Keywords

ASIC accelerators

Deep learning

Graph accelerators

Hardware-software co-design

Neural networks

VLSI for DSP

Description

University of Minnesota Ph.D. dissertation. June 2023. Major: Electrical/Computer Engineering. Advisor: Keshab Parhi. 1 computer file (PDF); xii, 169 pages.

Collections

Dissertations

Suggested Citation

Unnikrishnan, Nanda. (2023). Towards Hardware-Software Co-design for Energy-Efficient Deep Learning. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/258677.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

Towards Hardware-Software Co-design for Energy-Efficient Deep Learning

View/Download File

Persistent link to this item

Statistics

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation

University of Minnesota Twin Cities

Towards Hardware-Software Co-design for Energy-Efficient Deep Learning

View/Download File

Persistent link to this item

Statistics

Title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation