Between Dec 19, 2024 and Jan 2, 2025, datasets can be submitted to DRUM but will not be processed until after the break. Staff will not be available to answer email during this period, and will not be able to provide DOIs until after Jan 2. If you are in need of a DOI during this period, consider Dryad or OpenICPSR. Submission responses to the UDC may also be delayed during this time.
 

High Performance Storage System Design Using Emerging Storage Technologies

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

High Performance Storage System Design Using Emerging Storage Technologies

Published Date

2021-12

Publisher

Type

Thesis or Dissertation

Abstract

In the past few decades, data volume increases exponentially. Smart devices, social media, and e-business generate an extremely amount of data everyday. While big data is promising and has been successfully applied in many areas such as machine learning, financial market, and healthcare, it still faces many problems. One main challenge for large scale data storage system is how to store and retrieve data efficiently. In this thesis, we approach the above challenge by developing high performance storage systems using emerging storage technologies (i.e., non-volatile memory and non-volatile memory block device). First, we start by characterizing an emerging storage device - non-volatile memory (NVM) block device. With replacing the NAND flash chip inside the Solid State Drive (SSD) into NVM media, NVM block device delivers substantial performance improvements compared to NAND-based storage systems. However, its performance characteristics have not been well studied. In this study, we carry out intensive experiments and propose multiple custom-design micro-benchmarks to extract the intrinsic performance behaviors of the NVM block device, including the basic I/O performance behavior, advanced interleaving technology, performance consistency under highly intensive I/O workload, the influence of unaligned request size, the elimination of write-driven garbage collection, read disturb issues, and the tail latency problem. The performance is compared to that of a conventional NAND SSD to indicate the performance difference of the NVM block device in each scenario. In addition, by using an online analytical benchmark, a database system's performance is studied on our target storage devices to quantify the potential benefits of the NVM block device to a real application. Finally, the performance impact of hybrid NVM block devices and NAND SSD storage systems on a database application is investigated. Second, based on the understanding to the performance characteristics of NVM block device and the bottlenecks of database system we studied on the above, we present a hybrid storage database system, called HeuristicDB, that uses a non-volatile memory (NVM) block device as an extension of the database buffer pool. To consider the unique performance behavior of NVM block devices and the block-level characteristics of database requests, a set of heuristic rules that associate database (block) requests with the appropriate quality of service for the purpose of caching priority are proposed. To support the system implementation of the proposed rules, four programs, including a query profile queue, an eviction and demotion (EV) program, a table access pattern detector, and a page placement controller are developed. Using online analytical processing (OLAP) and online transactional processing (OLTP) benchmarks, both trace-based examination and system implementation on MySQL are carried out to evaluate the effectiveness of the proposed design. The experimental results show HeuristicDB delivers up to 75% higher performance than existing systems. Third, in the setting of NVM is going to replace DRAM to server as a main memory in the future, we introduce a machine learning based cache replacement algorithm, named ExpCache, to improve the NVM-based system performance. By considering the non-volatility characteristic of the NVM devices, we split the whole NVM into two caches, including a read cache and a write cache, for retaining different types of requests. The pages in each cache are managed by both LRU and LFU policies for balancing the recency and frequency of workloads. The online Expert machine learning algorithm is responsible for selecting a proper policy to evict a page from one of the caches based on the access patterns of workloads. Moreover, a read-write discriminated eviction program is developed to eliminate the number of dirty pages written back to the storage. In experimental results, the proposed ExpCache outperforms previous studies in terms of hit ratio and the number of dirty pages written back to storage. In summary, a NVM block device performance characterization, a hybrid storage database system, and an online learning based cache replacement algorithm for NVM are proposed in this thesis and allow large scale data storage system to deliver high performance.

Keywords

Description

University of Minnesota Ph.D. dissertation. 2021. Major: Electrical Engineering. Advisor: David Lilja. 1 computer file (PDF); 150 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Yang, Jinfeng. (2021). High Performance Storage System Design Using Emerging Storage Technologies. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/226666.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.