The underlying technologies for storing digital bits have become more diverse in last decade.There is no fundamental differences in their functionality yet their behaviors can be quite different and no single management technique seems to fit them all.The differences can be categorized based on the metric of interest such as the performance profile, the reliability profile and the power profile.These profiles are a function of the system and the workload assuming that the systems are exposed only to a pre-specified environment. Near infinite workload space makes it infeasible to obtain the complete profiles for any storage systems unless the system enforces a discrete and finite profile internally. The thesis of this work is that an acceptable approximation of the profiles may be achieved by proper characterization of the workloads.A set of statistical tools as well as understanding of system behavior were used to evaluate and design such characterizations.The correctness of the characterization cannot be fully proved except by showing that the resulting profile can correctly predict any workload and storage system interactions. While this is not possible, we show that we can provide a reasonable confidence in our characterization by statistical evaluation of results.The characterizations of this work were applied to compression ratio for backup data deduplication and load balancing of heterogeneous storage systems in a virtualized environments.The validation of our characterization is validated through hundreds of real world test cases as well as reasonable deductions based on our understanding of the storage systems. In both cases, the goodness of characterizations were rigorously evaluated using statistical techniques.The findings along the validations were both confirming and contradicting of many previous beliefs.
University of Minnesota Ph.D. dissertation. November 2013. Major: Electrical Engineering. Advisor: David J. Lilja. 1 computer file (PDF); xi, 110 pages.
Statistical characterization of storage system workloads for data deduplication and load placement in heterogeneous storage environments.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.