Efficient Data and Space Management in Modern Storage Devices and Systems

Thumbnail Image

Persistent link to this item

View Statistics

Journal Title

Journal ISSN

Volume Title


Efficient Data and Space Management in Modern Storage Devices and Systems

Published Date




Thesis or Dissertation


The ever-increasing amount of big data enables many modern applications, such as social networks, smart vehicles, and e-commerce. It also advances scientific research, such as astronomy, meteorology, epidemiology, and partial physics. In such a context of big data, people have developed a vast diversity of modern storage systems and storage devices to fulfill the rapid growth of data. One fundamental question for managing the storage systems and devices is to handle the trade-off between capacity and performance. For example, increased capacity is often at the expense of degraded performance. In the first part of the thesis, we focus on modern large-capacity storage devices and investigate techniques to mitigate such performance degradation due to the increase of the capacity. We start by characterizing one emerging large-capacity storage device – Host Aware SMR (HA-SMR) drive – and understanding the internals. We conduct in-depth performance evaluations on HA-SMR drives with a particular emphasis on the performance implications in large storage systems. We discover both favorable and adverse effects of using HA-SMR drives under various workloads. We also investigate the drive’s performance under production environments using real-world enterprise traces. Further, we carry out a case study to validate the understandings of the internal structure and the performance characteristics of the HA-SMR. In particular, to remedy the potential severe performance degradation in certain conditions, we develop a novel host- controlled buffer that can redirect and hold the write traffic to reduce the severity of the HA-SMR performance under unfriendly I/O access patterns. Inspired by the effectiveness of the proposed data and space management in HA-SMR drives, we proceed to investigate similar techniques in another promising high-capacity storage device: Hybrid Shingled Magnetic Recording (H-SMR) drive. H-SMR drive allows dynamic conversion of the recording format between CMR (Conventional Magnetic Recording) and SMR on a single disk drive. We design and implement FluidSMR, an adaptive management scheme for hybrid SMR Drives, to fully utilize the unique opportunities of H-SMR drives and to manage the trade-offs between the performance and capacity. We propose using spare CMR areas of the H-SMR drive to redirect updating traffic and design SMR-specialized replacement schemes to cache frequently updated data. FluidSMR can intelligently convert the format of the disk area based on storage usage as well as the workload property. Evaluations using enterprise traces demonstrate that FluidSMR outperforms baseline schemes in various workloads by decreasing the average I/O latency. Similar to the FluidSMR work, we also explore data and space management techniques in another high-capacity storage device: Interlaced Magnetic Recording (IMR). In IMR, top tracks and bottom tracks are interlaced, so each bottom track is partially overlapped with two adjacent top tracks. Updating bottom tracks requires reading and rewriting the affected valid data on the two neighboring top tracks, causing Write Amplification (WA). Top tracks, however, can be updated without WA. Few published studies are discussing WA in IMR drives. We propose TrackLace to reduce WA for IMR. Tracklace manages the space according to capacity usage. It allocates bottom tracks first when the capacity usage is low and gradually allocate top tracks when the capacity usage goes up. Tracklace opportunistically utilizes unallocated top tracks to buffer bottom track updates. Besides, it will progressively swap bottom track hot data with top track cold data during high space utilization. With other proposed optimization, in the performance evaluation, TrackLace reduces WA by 45% and lowers average latency by 31% compared with baseline schemes. In the second part of the thesis, we zoom out and study the capacity and performance problem in modern storage systems. Caching is one of the main techniques to address the capacity and performance trade-off in a storage system with both large/cheap and small/expensive storage devices. However, managing the space and data of the cache is increasingly difficult due to the complexity of the modern storage systems. One typical example of the storage system is the LSM-based key-value store, which has the leveled structure, heterogeneous cached items, and inter-correlated components. Designing an efficient caching scheme is a challenge given such complexity. We propose AC-Key to address the challenges of the caching problem in the LSM-based key-value store. AC-Key leverages a novel caching efficiency factor to capture the heterogeneity of the caching costs and benefits of cached entries. AC-Key manages three different caching components, namely key-value cache, key-pointer cache, and block cache to cache different cached entries and adjust their sizes according to the workload. We implement AC-Key by modifying RocksDB, a widely adopted key-value store system. The evaluation results show that the performance of AC-Key is higher than that of RocksDB in various workloads and is even better than the best offline fix-sized caching scheme in phase-changing workloads.


University of Minnesota Ph.D. dissertation. 2020. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); 179 pages.

Related to




Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Wu, Fenggang. (2020). Efficient Data and Space Management in Modern Storage Devices and Systems. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/216400.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.