Summarization - Compressing Data into an Informative Representation Report
2005-06-08
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Summarization - Compressing Data into an Informative Representation Report
Authors
Published Date
2005-06-08
Publisher
Type
Report
Abstract
Summarization is an important problem in many domains involving large datasets. Summarization can be essentially viewed as transformation of data into a concise yet meaningful representation which could be used for efficient storage or manual inspection. In this paper, we formulate the problem of summarization of a large dataset of transactions as an optimization problem involving two objective functions - compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We propose data mining techniques to obtain a summary for a given set of transactions while optimizing these two objective functions. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a meaningful representation. We first present a direct application of a standard clustering scheme to generate summaries. We then show how this could be significantly improved by using a multi-step approach which involves generating candidate summaries for a dataset using association analysis and then choosing a subset of these candidates as the summary with the desired compaction and information content. We present results of experiments conducted on real and artificial datasets to demonstrate the effectiveness of our techniques.
Keywords
Description
Related to
Replaces
License
Series/Report Number
Technical Report; 05-024
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Chandola, Varun; Kumar, Vipin. (2005). Summarization - Compressing Data into an Informative Representation Report. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215665.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.