Browsing by Subject "Low power"

Now showing 1 - 3 of 3

Low Power Approximate Hardware Design For Multimedia and Neural Network Applications
(2020-06) Sharmin Snigdha, Farhana
In today's data- and computation-driven society, day-to-day life depends on devices such as smartphones, laptops, smart watches, and biosensors/image sensors connected to computational engines. The computationally intensive applications that run on these devices incur high levels of chip power dissipation, and must operate under stringent power constraints due to thermal or battery life limitations. On future hardware platforms, a large fraction of computation power will be spent on error-tolerant multimedia applications such as signal processing tasks (on audio, video, or images) and artificial intelligence (AI) algorithms for recognizing voice and image data. For such error-tolerant applications, approximate computation has emerged as a new paradigm that provides a pragmatic approach for trading off energy/power for computational accuracy. A powerful method for implementing approximate computing is by performing logic-level or architecture-level hardware modifications. The effectiveness of an approximate system depends on identifying potential modes of approximation, accurate modeling of injected error as a function of the approximation, and optimization of the system to maximize energy savings for user-defined quality constraints. However, current approaches to approximate computation involve ad hoc trial-and-error based methods that do not consider the effect of approximations on system-level quality metrics. Additionally, prior methods for approximate computation have provided little or no scope for modulating the design based on user- and application-specific error budgets. HASH(0x4210e28) This thesis proposes adaptive frameworks for energy-efficient approximate computing, leveraging the target application characteristics, system architecture, and input information to build fast, power-efficient approximate circuits under a user-defined error budget. The work is focused on two well-established, widely-used, and computationally intensive applications: multimedia and artificial intelligence. For multimedia systems, where minor errors in audio, image, and video are imperceptible to the human senses, approximate computations can be very effective in saving energy without significant loss in the quality of results. AI applications are also good candidates for approximation as they have inherent error-resilience feedback mechanisms embedded into their computations. This thesis demonstrates methodologies for approximate computing on representative platforms from the multimedia and AI domains, namely, the widely used JPEG architecture, and various architectures for deep learning. The first part of the thesis develops a methodology for designing approximate hardware for JPEG that is input-independent, i.e., it aims to meet the specified error budgets for any inputs. The error sensitivities of various arithmetic units within the JPEG architecture with respect to the quality of the output image are first modeled, and a novel optimization problem is then formulated, using the error sensitivity model, to maximize power savings under an error budget. The optimized solution provides 1.5x-2.0x power savings over the original accurate design, with negligible image quality degradation. However, the degree of approximation in this approach must necessarily be chosen conservatively to stay within the error budget over all possible input images. The second part of the thesis designs an image-dependent approximate computation process that uses image-specific input statistics to dynamically increase the approximation level over the image-independent approach, thereby reducing its conservatism. This approach must overcome several challenges: circuitry for real-time extraction of input image statistics must be inexpensive in terms of both power and computation time, and schemes for translating abstracted image information into dynamically chosen approximation levels in hardware must be devised. The approach devises a simplified heuristic to estimate the input data distribution. Based on this distribution, a dynamic approximate architecture is developed, altering the approximation levels for input images in real-time. Over a set of benchmarks, the input-dependent approximation provides an average of 31% additional power improvement, as compared to the input-independent approximation process. The final part of the thesis addresses the use of approximate computing for convolutional neural networks (CNNs), which have achieved unprecedented accuracy on many modern AI applications. The inherent error-resilience and large computation requirements imply that CNN hardware implementations are excellent candidates for approximate computation. A systematic framework is developed to dynamically reduce the computation in the CNN based on its inputs. The approach is motivated by the observation that for a specific input class, during both the training and testing phases, some features tend to be activated together while others are unlikely to be activated. A dynamic selective feature activation framework, SeFAct, is proposed for energy-efficient CNN hardware accelerators to early predict an input class and only perform necessary computations. For various state-of-the-art neural networks, the results show that energy savings of 20%-25% are achievable, after accounting for all implementation overheads, with small loss in accuracy. Moreover, a trade-off between accuracy and energy savings may be characterized using the proposed approach.
Low voltage / low power rail-to-rail CMOS operational amplifier for portable ECG
(2013-08) Lee, Boram
One of the most important building blocks in modern IC design is the operational amplifier. For the portable electrocardiogram (ECG), the operational amplifier is employed to sense and amplify the electrical signal of heartbeat of human body. For the battery powered portable ECG system, low supply voltage environments are required to reduce power consumption and the result is a reduced input common mode range (ICMR) of the op-amp. To overcome the reduced ICMR problem, complementary differential pairs operated in parallel are commonly used to achieve a rail-to-rail input common mode range. However, this complementary differential input pair structure can have a substantial transconductance (gm) variation problem and a dead zone problem in a low supply voltage environment and an extremely low supply voltage environment respectively. In the past years, a number of techniques have been proposed to overcome those problems for low- and extremely low-supply voltage environments. This dissertation is focused on an op-amp applicable to a portable ECG system and in total five novel rail-to-rail constant gm op-amps usful for circuits such as a portable ECG are proposed. Three of those op-amps work in the low supply voltage environment and two op-amps are proposed for the extremely low supply voltage environment. Cadence SPECTRE simulation and TSMC 0.25-µm CMOS technology are used to simulate and lay out these works.
Memory Design for Centimeter-Scale Organic and Nanometer-Scale Silicon Technologies
(2012-07) Zhang, Wei
Low power memory is always desired due to its significance in many large-scale applications. It is important to emerging technologies such as organic electronics, since it is an indispensible component to extend the technology towards larger application scope with complicated functionalities. It is also a hot topic in the mature silicon technology because the device scaling makes memory designs challenging with increasing leakage currents and process variations. Organic electronics deals with conductive polymers and plastics, and is capable of realizing large area flexible applications, which cannot be fulfilled by modern silicon technology. Conventional organic devices require a high operation voltage due to its low carrier mobility. Ion-gel gated OTFTs (gel-OTFTs), however, deliver unusually high gate capacitance through an electrolyte-gated structure, and therefore offer sufficient drive currents under a low voltage. Being an emerging technology, few attempts have been made on organic memory designs. In this dissertation, we first propose an improved design-fabrication-testing flow to significantly facilitate the entire process, which boosts the design efficiency and fabrication yield and thus enables the implementation of complex circuits such as memory array. An organic process design kit (OPDK) with various modeling approaches allows designers to easily design organic circuits in a similar way as that in silicon technology. Various circuit components including logic gates, ring oscillators and a D-flipflop were demonstrated and a general purpose organic dynamic memory cell was proposed for the first time. The cell, known as a DRAM gain cell, achieves a sub-10nW-per-cell refresh power with a retention time of over 1 minute, which is 5 orders of magnitude longer than that in silicon designs. The same DRAM gain cell architecture is also found potential as embedded memory in the modern silicon technology, where the prevailing 6T SRAM is suffering from leakage power and poor low voltage margin when devices keep scaling down. In this dissertation we report the first variation-aware performance analysis on the silicon gain cell and reveal that conventional corner simulations are no longer valid in capturing worst cases of gain cells. Insights can be obtained through the various analysis approaches described in the dissertation to benefit future memory design strategy and device optimization. With innovations in cell structure and peripheral circuitry, the silicon gain cell performance can be further enhanced to compete with the mainstream 6T SRAM. In this dissertation, we for the first time experimentally demonstrate a gain cell design with write-back-free read operations, utilizing its non-destructive read nature to improve the read speed into GHz regime without sacrificing retention time. Various circuit techniques including a local-sense-amplifier architecture are proposed to eliminate the need of a complex current-sensing scheme, and a dual-row-access mode is proposed for further power saving in half-utilization scenarios. The test chip in a 65nm low power process achieves a 23.9% power saving compared to a 6T SRAM at 0.6V retention voltage and an additional 27.8% power saving during cases when only half array is needed.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Low power"