Statistical characterization of storage system workloads for data deduplication and load placement in heterogeneous storage environments
2013-11
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Statistical characterization of storage system workloads for data deduplication and load placement in heterogeneous storage environments
Authors
Published Date
2013-11
Publisher
Type
Thesis or Dissertation
Abstract
The underlying technologies for storing digital bits have become more diverse in last decade.There is no fundamental differences in their functionality yet their behaviors can be quite different and no single management technique seems to fit them all.The differences can be categorized based on the metric of interest such as the performance profile, the reliability profile and the power profile.These profiles are a function of the system and the workload assuming that the systems are exposed only to a pre-specified environment. Near infinite workload space makes it infeasible to obtain the complete profiles for any storage systems unless the system enforces a discrete and finite profile internally. The thesis of this work is that an acceptable approximation of the profiles may be achieved by proper characterization of the workloads.A set of statistical tools as well as understanding of system behavior were used to evaluate and design such characterizations.The correctness of the characterization cannot be fully proved except by showing that the resulting profile can correctly predict any workload and storage system interactions. While this is not possible, we show that we can provide a reasonable confidence in our characterization by statistical evaluation of results.The characterizations of this work were applied to compression ratio for backup data deduplication and load balancing of heterogeneous storage systems in a virtualized environments.The validation of our characterization is validated through hundreds of real world test cases as well as reasonable deductions based on our understanding of the storage systems. In both cases, the goodness of characterizations were rigorously evaluated using statistical techniques.The findings along the validations were both confirming and contradicting of many previous beliefs.
Description
University of Minnesota Ph.D. dissertation. November 2013. Major: Electrical Engineering. Advisor: David J. Lilja. 1 computer file (PDF); xi, 110 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Park, Nohhyun. (2013). Statistical characterization of storage system workloads for data deduplication and load placement in heterogeneous storage environments. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/36767.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.