A computational approach to detection of conceptual incongruity in text and its applications
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Authors
Published Date
Publisher
Abstract
Given a text corpus, which particular pieces of text would be most interesting to human subjects? Is it possible to quantify a subjective idea like "interestingness" in the domain of text data and build algorithms to detect it? This thesis provides a computational investigation of the above questions. The incongruity theory of curiosity postulates that humans deem the optimal presence of conceptual incongruity in their observations as "interesting" . Based on this idea, we propose that, incongruity of a textual topic can be detected by measuring two things, the statistical rarity of the topic in the given corpus and the contextual deviance of the words in the given topic measured from a universal distribution of word co-usage in the society. Based on this concept, we present algorithms to quantify conceptual incongruity and detect different kinds of interestingness (at a sample level) in text data. We first present an algorithm to detect incongruous topics in large scale text corpora. We could detect incongruous emails from the Enron corpus, deviant paper abstracts and incongruous blog posts. We then extend this algorithm to present a computational model of humor, which was used to detect funny videos from YouTube using a given video's tag-set. We then provide different flavors of this algorithm to detect choice of words considered creative by humans and most popular set of media objects in social networks. We then show the information theoretic motivations behind our proposal and demonstrate that it maps directly to some basic principles. Finally we investigate, if it's the mere presence of incongruity or its eventual resolution which is the real cause of interest stimulation. We present an algorithm to carry out this test and report some interesting results. The generalizability of our results in finding interestingness across these different domains using algorithms derived using intrinsic human motivations, opens up exciting new avenues in the field of knowledge discovery.
Keywords
Description
University of Minnesota Ph.D. dissertation. May 2013. Major: Computer Science. Advisor: Jaideep Srivastava. 1 computer file (PDF); 114 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Mahapatra, Amogh. (2013). A computational approach to detection of conceptual incongruity in text and its applications. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/173924.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.