Given a text corpus, which particular pieces of text would be most interesting to human subjects? Is it possible to quantify a subjective idea like "interestingness" in the domain of text data and build algorithms to detect it? This thesis provides a computational investigation of the above questions. The incongruity theory of curiosity postulates that humans deem the optimal presence of conceptual incongruity in their observations as "interesting" . Based on this idea, we propose that, incongruity of a textual topic can be detected by measuring two things, the statistical rarity of the topic in the given corpus and the contextual deviance of the words in the given topic measured from a universal distribution of word co-usage in the society. Based on this concept, we present algorithms to quantify conceptual incongruity and detect different kinds of interestingness (at a sample level) in text data. We first present an algorithm to detect incongruous topics in large scale text corpora. We could detect incongruous emails from the Enron corpus, deviant paper abstracts and incongruous blog posts. We then extend this algorithm to present a computational model of humor, which was used to detect funny videos from YouTube using a given video's tag-set. We then provide different flavors of this algorithm to detect choice of words considered creative by humans and most popular set of media objects in social networks. We then show the information theoretic motivations behind our proposal and demonstrate that it maps directly to some basic principles. Finally we investigate, if it's the mere presence of incongruity or its eventual resolution which is the real cause of interest stimulation. We present an algorithm to carry out this test and report some interesting results. The generalizability of our results in finding interestingness across these different domains using algorithms derived using intrinsic human motivations, opens up exciting new avenues in the field of knowledge discovery.
University of Minnesota Ph.D. dissertation. May 2013. Major: Computer Science. Advisor: Jaideep Srivastava. 1 computer file (PDF); 114 pages.
A computational approach to detection of conceptual incongruity in text and its applications.
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.