Choi, Kyubaik2023-05-122023-05-122023-03https://hdl.handle.net/11299/254129University of Minnesota Ph.D. dissertation. March 2023. Major: Electrical Engineering. Advisor: Gerald Sobelman. 1 computer file (PDF); xi, 102 pages.NN (Neural Network) based algorithms such as CNNs (Convolutional Neural Networks) or DNNs (Deep Neural Networks) have enabled many exciting computer vision and audio processing applications. However, the computational complexity of those NN based algorithms is significantly high to process the large amount of neuron computations. Even if the algorithmic complexity is optimized, most NN based algorithms are challenging to be executed on low-cost and low-power edge systems due to the lack of sufficient computational capability in low-cost, battery-powered edge devices. Therefore, low-cost and low-power edge devices require a highly optimized hardware accelerator to achieve better performance and low power in order to execute the NN-based algorithms. A possible solution would be to use a graphical processing unit (GPU). However, a GPU can require a large power budget and it is usually too expensive to be incorporated into an AI edge device. Moreover, a GPU achieves high throughput by processing several images at the same time while most AI edge devices use sequential data captured from sensors in real time. Therefore, an optimal solution is a field programmable gate array (FPGA) or ASIC (Application Specific Integrated Circuit) that has an advantage in processing a sequential data stream and provides high performance hardware acceleration with low power consumption, low latency and cost efficiency. This thesis proposes three efficient NN accelerator architectures and implementations that are suitable to be executed on FPGA or ASIC for low-power and low-cost edge systems. Firstly, we describe an optimized face detection and alignment for low-cost and low-power IoT systems. We utilize a multi-task, cascaded convolutional neural network and propose a highly optimized and customized hardware accelerator. Secondly, an efficient sparse neural network accelerator for low-cost edge systems is described. We propose highly optimized hardware architecture and implementation details to accelerate sparse data that is produced by a ReLU (Rectified Linear Unit) activation function. Lastly, we propose an efficient CNN accelerator for low-cost edge systems. MobileNetV2 is utilized as the basis of this accelerator and we optimize it for use in low-cost edge systems. Optimization techniques at both the algorithm and hardware design levels are described. The results from the above research efforts show improvements in performance and in hardware complexity, as will be described in the following chapters.enEfficient Neural Network Accelerators for Low-Cost Edge SystemsThesis or Dissertation