An efficient data deduplication design with flash-memory based solid state drive.

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

An efficient data deduplication design with flash-memory based solid state drive.

Alternative title

Published Date

2012-01

Publisher

Type

Thesis or Dissertation

Abstract

Today, a predominant portion of Internet services (e.g., content delivery networks, online backup storage, news broadcasting, blog sharing and social networks) are data centric. A significant amount of new data is generated by these services every day and a large portion of this created data is redundant. Data deduplication is a prevailing technique used to identify and eliminate redundant data, so as to reduce the space requirement for both primary file systems and data backups. The variety of objectives in a deduplication system design is the primary interest of this dissertation. These objects include maximizing the redundant data removed and achieving a high deduplication read/write throughput with a minimum RAM overhead per chunk. To achieve the first objective, this dissertation proposes a novel chunking algorithm that breaks the input dataset into chunks, with a higher redundancy or with larger sizes, so as to identify the more duplicated data without producing larger numbers of chunks, as compared to other chunking algorithms. To achieve high deduplication throughput while minimizing RAM overhead per chunk, this dissertation proposes a RAM frugal chunk index design along with a chunk filter that is used to filter out index lookups on nonexistent chunks. Both index and filter designs efficiently use a very limited RAM space with flash-memory as persistent storage. In particular, the proposed chunk filter design can dynamically scale up to adapt to the growth of the dataset. In addition, the proposed chunk index design could achieve high throughput, low latency chunk lookup/insert operations with extremely low RAM overhead at the sub-byte-per-chunk level.

Description

University of Minnesota Ph.D. dissertation. January 2012. Major: Computer science. Advisor: Prof. David Hung-Chang Du. 1 computer file (PDF); ix, 104 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Lu, Guanlin. (2012). An efficient data deduplication design with flash-memory based solid state drive.. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/120894.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.