Low Power Approximate Hardware Design For Multimedia and Neural Network Applications

Sharmin Snigdha, Farhana2020-09-082020-09-082020-06https://hdl.handle.net/11299/216114University of Minnesota Ph.D. dissertation. June 2020. Major: Electrical/Computer Engineering. Advisor: Sachin Sapatnekar. 1 computer file (PDF); xiii, 112 pages.In today's data- and computation-driven society, day-to-day life depends on devices such as smartphones, laptops, smart watches, and biosensors/image sensors connected to computational engines. The computationally intensive applications that run on these devices incur high levels of chip power dissipation, and must operate under stringent power constraints due to thermal or battery life limitations. On future hardware platforms, a large fraction of computation power will be spent on error-tolerant multimedia applications such as signal processing tasks (on audio, video, or images) and artificial intelligence (AI) algorithms for recognizing voice and image data. For such error-tolerant applications, approximate computation has emerged as a new paradigm that provides a pragmatic approach for trading off energy/power for computational accuracy. A powerful method for implementing approximate computing is by performing logic-level or architecture-level hardware modifications. The effectiveness of an approximate system depends on identifying potential modes of approximation, accurate modeling of injected error as a function of the approximation, and optimization of the system to maximize energy savings for user-defined quality constraints. However, current approaches to approximate computation involve ad hoc trial-and-error based methods that do not consider the effect of approximations on system-level quality metrics. Additionally, prior methods for approximate computation have provided little or no scope for modulating the design based on user- and application-specific error budgets. HASH(0x4210e28) This thesis proposes adaptive frameworks for energy-efficient approximate computing, leveraging the target application characteristics, system architecture, and input information to build fast, power-efficient approximate circuits under a user-defined error budget. The work is focused on two well-established, widely-used, and computationally intensive applications: multimedia and artificial intelligence. For multimedia systems, where minor errors in audio, image, and video are imperceptible to the human senses, approximate computations can be very effective in saving energy without significant loss in the quality of results. AI applications are also good candidates for approximation as they have inherent error-resilience feedback mechanisms embedded into their computations. This thesis demonstrates methodologies for approximate computing on representative platforms from the multimedia and AI domains, namely, the widely used JPEG architecture, and various architectures for deep learning. The first part of the thesis develops a methodology for designing approximate hardware for JPEG that is input-independent, i.e., it aims to meet the specified error budgets for any inputs. The error sensitivities of various arithmetic units within the JPEG architecture with respect to the quality of the output image are first modeled, and a novel optimization problem is then formulated, using the error sensitivity model, to maximize power savings under an error budget. The optimized solution provides 1.5x-2.0x power savings over the original accurate design, with negligible image quality degradation. However, the degree of approximation in this approach must necessarily be chosen conservatively to stay within the error budget over all possible input images. The second part of the thesis designs an image-dependent approximate computation process that uses image-specific input statistics to dynamically increase the approximation level over the image-independent approach, thereby reducing its conservatism. This approach must overcome several challenges: circuitry for real-time extraction of input image statistics must be inexpensive in terms of both power and computation time, and schemes for translating abstracted image information into dynamically chosen approximation levels in hardware must be devised. The approach devises a simplified heuristic to estimate the input data distribution. Based on this distribution, a dynamic approximate architecture is developed, altering the approximation levels for input images in real-time. Over a set of benchmarks, the input-dependent approximation provides an average of 31% additional power improvement, as compared to the input-independent approximation process. The final part of the thesis addresses the use of approximate computing for convolutional neural networks (CNNs), which have achieved unprecedented accuracy on many modern AI applications. The inherent error-resilience and large computation requirements imply that CNN hardware implementations are excellent candidates for approximate computation. A systematic framework is developed to dynamically reduce the computation in the CNN based on its inputs. The approach is motivated by the observation that for a specific input class, during both the training and testing phases, some features tend to be activated together while others are unlikely to be activated. A dynamic selective feature activation framework, SeFAct, is proposed for energy-efficient CNN hardware accelerators to early predict an input class and only perform necessary computations. For various state-of-the-art neural networks, the results show that energy savings of 20%-25% are achievable, after accounting for all implementation overheads, with small loss in accuracy. Moreover, a trade-off between accuracy and energy savings may be characterized using the proposed approach.enApproximate computingEnergy optimizationError-tolerant applicationJPEGLow powerNeural networkLow Power Approximate Hardware Design For Multimedia and Neural Network ApplicationsThesis or Dissertation