A computational approach to detection of conceptual incongruity in text and its applications

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Published Date

Publisher

Abstract

Given a text corpus, which particular pieces of text would be most interesting to human subjects? Is it possible to quantify a subjective idea like "interestingness" in the domain of text data and build algorithms to detect it? This thesis provides a computational investigation of the above questions. The incongruity theory of curiosity postulates that humans deem the optimal presence of conceptual incongruity in their observations as "interesting" . Based on this idea, we propose that, incongruity of a textual topic can be detected by measuring two things, the statistical rarity of the topic in the given corpus and the contextual deviance of the words in the given topic measured from a universal distribution of word co-usage in the society. Based on this concept, we present algorithms to quantify conceptual incongruity and detect different kinds of interestingness (at a sample level) in text data. We first present an algorithm to detect incongruous topics in large scale text corpora. We could detect incongruous emails from the Enron corpus, deviant paper abstracts and incongruous blog posts. We then extend this algorithm to present a computational model of humor, which was used to detect funny videos from YouTube using a given video's tag-set. We then provide different flavors of this algorithm to detect choice of words considered creative by humans and most popular set of media objects in social networks. We then show the information theoretic motivations behind our proposal and demonstrate that it maps directly to some basic principles. Finally we investigate, if it's the mere presence of incongruity or its eventual resolution which is the real cause of interest stimulation. We present an algorithm to carry out this test and report some interesting results. The generalizability of our results in finding interestingness across these different domains using algorithms derived using intrinsic human motivations, opens up exciting new avenues in the field of knowledge discovery.

Keywords

Description

University of Minnesota Ph.D. dissertation. May 2013. Major: Computer Science. Advisor: Jaideep Srivastava. 1 computer file (PDF); 114 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

Mahapatra, Amogh. (2013). A computational approach to detection of conceptual incongruity in text and its applications. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/173924.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.