Modeling Outputs of Efficient Compressibility Estimators
2018-06
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Modeling Outputs of Efficient Compressibility Estimators
Alternative title
Authors
Published Date
2018-06
Publisher
Type
Thesis or Dissertation
Abstract
There are times when it is helpful to know whether data is compressible before expending computational resources to compress it. The standard deviation of the byte distribution of data is an example of a measure of compressibility that does not involve actually compressing the data. This work considered five such measures of compressibility: byte standard deviation, shannon entropy, “average meaning entropy”, “byte counting” and “heuristic method”. We developed models that relate the output of these measures to the compression ratios of gzip, lz4 and xz using data retrieved from browsing Facebook, Wikipedia and YouTube. The models for byte standard deviation, shannon entropy and “average meaning entropy” were linear in both the parameters and the variables. The model for “byte counting” was non-linear in the predictor variable but linear in the parameters. The “heuristic method” was a classification model. In general, there was a strong relationship between the measures and the compressibility of a given data. Also, in many cases the models developed using one set of data from a source (like Youtube) was able to estimate the compressibility of another data set from the same source to a useful extent. This suggests the potential for developing a model per ECE for a source that can predict, to a useful degree, the compressibility of data from that source. At the same time, the differences in accuracy when models were evaluated on the data they were developed from versus when evaluated on new data from the same source indicate that there are important differences in the nature of the data coming from even the same source.
Keywords
Description
University of Minnesota M.S. thesis. June 2018. Major: Computer Science. Advisor: Peter Peterson. 1 computer file (PDF); viii, 94 pages.
Related to
Replaces
License
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Asamoah Owusu, Dennis. (2018). Modeling Outputs of Efficient Compressibility Estimators. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/200159.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.