Modeling Outputs of Efficient Compressibility Estimators

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Modeling Outputs of Efficient Compressibility Estimators

Published Date

2018-06

Publisher

Type

Thesis or Dissertation

Abstract

There are times when it is helpful to know whether data is compressible before expending computational resources to compress it. The standard deviation of the byte distribution of data is an example of a measure of compressibility that does not involve actually compressing the data. This work considered five such measures of compressibility: byte standard deviation, shannon entropy, “average meaning entropy”, “byte counting” and “heuristic method”. We developed models that relate the output of these measures to the compression ratios of gzip, lz4 and xz using data retrieved from browsing Facebook, Wikipedia and YouTube. The models for byte standard deviation, shannon entropy and “average meaning entropy” were linear in both the parameters and the variables. The model for “byte counting” was non-linear in the predictor variable but linear in the parameters. The “heuristic method” was a classification model. In general, there was a strong relationship between the measures and the compressibility of a given data. Also, in many cases the models developed using one set of data from a source (like Youtube) was able to estimate the compressibility of another data set from the same source to a useful extent. This suggests the potential for developing a model per ECE for a source that can predict, to a useful degree, the compressibility of data from that source. At the same time, the differences in accuracy when models were evaluated on the data they were developed from versus when evaluated on new data from the same source indicate that there are important differences in the nature of the data coming from even the same source.

Description

University of Minnesota M.S. thesis. June 2018. Major: Computer Science. Advisor: Peter Peterson. 1 computer file (PDF); viii, 94 pages.

Related to

Replaces

License

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Asamoah Owusu, Dennis. (2018). Modeling Outputs of Efficient Compressibility Estimators. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/200159.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.