Statistical characterization of storage system workloads for data deduplication and load placement in heterogeneous storage environments

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Statistical characterization of storage system workloads for data deduplication and load placement in heterogeneous storage environments

Published Date

2013-11

Publisher

Type

Thesis or Dissertation

Abstract

The underlying technologies for storing digital bits have become more diverse in last decade.There is no fundamental differences in their functionality yet their behaviors can be quite different and no single management technique seems to fit them all.The differences can be categorized based on the metric of interest such as the performance profile, the reliability profile and the power profile.These profiles are a function of the system and the workload assuming that the systems are exposed only to a pre-specified environment. Near infinite workload space makes it infeasible to obtain the complete profiles for any storage systems unless the system enforces a discrete and finite profile internally. The thesis of this work is that an acceptable approximation of the profiles may be achieved by proper characterization of the workloads.A set of statistical tools as well as understanding of system behavior were used to evaluate and design such characterizations.The correctness of the characterization cannot be fully proved except by showing that the resulting profile can correctly predict any workload and storage system interactions. While this is not possible, we show that we can provide a reasonable confidence in our characterization by statistical evaluation of results.The characterizations of this work were applied to compression ratio for backup data deduplication and load balancing of heterogeneous storage systems in a virtualized environments.The validation of our characterization is validated through hundreds of real world test cases as well as reasonable deductions based on our understanding of the storage systems. In both cases, the goodness of characterizations were rigorously evaluated using statistical techniques.The findings along the validations were both confirming and contradicting of many previous beliefs.

Description

University of Minnesota Ph.D. dissertation. November 2013. Major: Electrical Engineering. Advisor: David J. Lilja. 1 computer file (PDF); xi, 110 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Park, Nohhyun. (2013). Statistical characterization of storage system workloads for data deduplication and load placement in heterogeneous storage environments. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/36767.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.