Coreference resolution (CR) and entity relation detection (ERD) aim at finding predefined
relations between pairs of entities in text. CR focuses on resolving identity relations
while ERD focuses on detecting non-identity relations. Both CR and ERD are important
as they can potentially improve other natural language processing (NLP) related tasks such
information retrieval and extraction, web-searching, and question answering and also enhance
non-NLP tasks such as computer vision, database constructions or ontologies.
In this thesis, I propose models to handle both coreference resolution (CR) and entity
relation detection (ERD). Both systems are built onmachine learningmodels. The CR system
is based on Factorial Hidden Markov Models (FHMMs). The ERD is based on Maximum
Entropy Discriminant Latent Dirichlet Allocation (MEDLDA). The work on CR only resolves
pronouns. It is a supervised system trained on annotated corpus. The basic idea is that
the hidden states of FHMMs are an explicit short-term memory with an antecedent buffer
containing recently described referents. Thus an observed pronoun can find its antecedent
from the hidden buffer, or in terms of a generative model, the entries in the hidden buffer
generate the corresponding pronouns. In the hidden buffer, all references are expressed as
diverse features. In this work, besides the common gender, number, person and animacy, I
convertedGivennessHierarchy and Centering Theories to probabilistic features, thus greatly
improving the accuracy. A system implementing this model is evaluated on the ACE corpus
and I2B2 medical corpus with promising performance.
For ERD, a novel application of topic models is proposed to do this task. In order to make use of the latent semantics of text, the task of relation detection is reformulated as a
topic modeling problem. Themotivation is to find underlying topics which are indicative of
relations between named entities. The approach considers pairs of named entities and features
associated with them as mini documents. The system, called ERD-MEDLDA, adapts
Maximum Entropy Discriminant Latent Dirichlet Allocation (MedLDA) with mixed membership
for relation detection. By using supervision, ERD-MedLDA is able to learn topic
distributions indicative of relation types. Further, ERD-MEDLDA is a topicmodel that combines
the benefits of both Maximum Likelihood Estimation (MLE) and Maximum Margin Estimation (MME), and themixed membership formulation enables the system to incorporate
heterogeneous features. We incorporate diverse features into the system and perform
experiments on the ACE 2005 corpus. Our approach achieves better overall performance
for precision, recall and Fmeasuremetrics as compared to SVM-based and LDA-basedmodels.
ERD-MedLDA also shows better overall performance than state-of-the-art kernels used
previously for relation detection.
University of Minnesota Ph.D. dissertation. January 2012. Major: Linguistics. Advisors: Jeanette Gundel,WilliamSchuler. 1 computer file (PDF); xi, 124 pages.
Entity relation detection with Factorial Hidden Markov Models and Maximum Entropy Discriminant Latent Dirichlet Allocations..
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.