Non-volatile In-memory Computing for Large Scale Data-Intensive Workloads: Challenges and Opportunities

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Non-volatile In-memory Computing for Large Scale Data-Intensive Workloads: Challenges and Opportunities

Alternative title

Published Date

2021-12

Publisher

Type

Thesis or Dissertation

Abstract

The application(domain)s that depend on the large amount of data for solving problems, e.g., genome sequence analysis, graph analytics, machine learning etc., suffer from growing overhead of data communication between physically separate logic (i.e., compute) and memory elements in conventional von Neumann computing. The recent progress in processing(/computing)-in-memory (PIM/CIM) or simply, in-memory computing addresses data communication overhead in these applications by fusing compute capability with memory where the data reside– thereby achieving reduced energy consumption, and higher application throughput due to access to the higher internal bandwidth of the memory substrate as compared to the off-chip bandwidth.In this thesis, we focus on the architecture- and application-level characterizations of PIM architecture, Computational RAM (CRAM) in particular, for large scale data-intensive workloads–in terms of opportunities and challenges. We demonstrate the efficacy of CRAM in reducing the communication bottleneck of genomic sequence analysis, as a representative application domain due to its importance and inherent characteristics that are suitable for PIM-based implementation, by designing various CRAM-based Hardware (HW) accelerators. The designs cover all architectural aspects such as data layout, spatio-temporal scheduling of compute, system integration etc. First, we introduce an in-memory accelerator architecture, BWA-CRAM, for DNA sequence alignment by direct mapping of state-of-the-art Burrows–Wheeler Aligner algorithm on CRAM. This architecture outperforms corresponding software implementation in terms of throughput and energy efficiency, even under conservative assumptions. Next, we improve the performance of DNA sequence (pre-)alignment (and other similar, generic pattern matching applications) through HW/SW co-design and introduce SpinPM, a novel high-density, reconfigurable spintronic in-memory pattern matching substrate based on CRAM with Spin-Orbit-Torque (SOT)– specifically Spin-Hall-Effect (SHE) MTJ devices; and demonstrate the performance benefit SpinPM can achieve over conventional and near-memory processing systems. Subsequently, we present CRAM-Seq, an accelerator for RNA-Seq abundance quantification based on CRAM. Through HW/SW co-design, we demonstrate that CRAM-Seq outperforms a commonly used state-of-the-art software abundance quantification algorithm, Kallisto, in terms of throughput and energy efficiency. We introduce Content Addressable Memory or CAM, which is very efficient in large scale pattern matching, functionality in CRAM, next. We present CAMeleon- a novel compute substrate that leverages the high energy efficiency benefit of CRAM, and is capable of satisfying very stringent hardware resource (area) budget in embedded/edge computing applications, e.g., handheld sequencing device. CAMeleon performs CAM operations more energy-efficiently while consuming less/similar area, and supports logic and memory functions beyond CAM operations on demand through reconfiguration, as compared to conventional CAM-only designs based on SRAM and emerging memory technologies (such as STT-MTJ, ReRAM and PCM). Finally, we study the impact on applications’ reliability due to mapping on a PIM substrate, focusing on PIM architectures that perform logic operations directly within memory arrays, in-situ, obviating any need for data transfers (even to and from the array periphery), e.g., CRAM. Here we (i) quantitatively characterize gate–flip errors, an acute class of functional errors specific to such PIM systems, where, due to parametric variations, a logic gate can behave as another; and (ii) analyze to what extent algorithmic noise tolerance can mask gate-flips.

Description

University of Minnesota Ph.D. dissertation. December 2021. Major: Electrical/Computer Engineering. Advisor: Ulya Karpuzcu. 1 computer file (PDF); xi, 147 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Chowdhury, Zamshed. (2021). Non-volatile In-memory Computing for Large Scale Data-Intensive Workloads: Challenges and Opportunities. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/260626.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.