Browsing by Subject "Syntax"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item An incremental syntactic language model for statistical phrase-based translation.(2012-02) Schwartz, Lane Oscar BingamanModern machine translation techniques typically incorporate both a translation model, which guides how individual words and phrases can be translated, and a language model (LM), which promotes fluency as translated words and phrases are combined into a translated sentence. Most attempts to inform the translation process with linguistic knowledge have focused on infusing syntax into translation models. We present a novel technique for incorporating syntactic knowledge as a language model in the context of statistical phrase-based machine translation (Koehn et al., 2003), one of the most widely used modern translation paradigms. The major contributions of this work are as follows: #15; We present a formal definition of an incremental syntactic language model as a Hierarchical Hidden Markov Model (HHMM), and detail how this model is estimated from a treebank corpus of labelled data. #15; The HHMM syntactic language model has been used in prior work involving parsing, speech recognition, and semantic role labelling. We present the first complete algorithmic definition of the HHMM as a language model. #15; We develop a novel and general method for incorporating any generative incremental language model into phrase-based machine translation. We integrate our HHMM incremental syntactic language model into Moses, the prevailing phrase-based decoder. #15; We present empirical results that demonstrate substantial improvements in perplexity for our syntactic language model over traditional n-gram language models; we also present empirical results on a constrained Urdu-English translation task that demonstrate the use of our syntactic LM.A standard measure of language model quality is average per-word perplexity. We present empirical results evaluating perplexity of various n-gram language models and our syntactic language model on both in-domain and out-of-domain test sets. On an in-domain test set, a traditional 5-gram language model trained on the same data as our syntactic language model outperforms the syntactic language model in terms of perplexity. We find that interpolating the 5-gram LM with the syntactic LM results in improved perplexity results, a 10% absolute reduction in perplexity compared to the 5-gram LM alone. On an out-of-domain test set, we find that our syntactic LM substantially outperforms all other LMs trained on the same training data. The syntactic LM demonstrates a 58% absolute reduction in perplexity over a 5-gram language model trained on the same training data. On this same out-of-domain test set, we further show that interpolating our syntactic language model with a large Gigaword-scale 5-gram language model results in the best overall perplexity results — a 61% absolute reduction in perplexity compared to the Gigaword-scale 5-gram language model alone, a 76% absolute reduction in perplexity compared to the syntactic LM alone, and a 90% absolute reduction in perplexity compared to the original smaller 5-gram language model. A language model with low perplexity is a theoretically good model of the language; it is expected that using an LM with low perplexity as a component of a machine translation system should result in more fluent translations. We present empirical results on a constrained Urdu-English translation task and perform an informal manual evaluation of translation results which suggests that the use of our incremental syntactic language model is indeed serving to guide the translation algorithm towards more fluent target language translations.Item Iron Range English long-distance reflexives.(2011-07) Loss, Sara SchmelzerThis dissertation investigates the distribution of Iron Range English (IRE) reflexives, using judgments collected in a Magnitude Estimation task (Bard et al 1996), and presents a phase-based analysis for their distribution. IRE reflexives (e.g., himself) can corefer with nominal expressions outside their minimal clause in subject or object position. Coreference with an expression outside the minimal clause is not acceptable in two environments: (i) if there is an intervening subject that does not match the reflexive for person (c.f., Blocking Effects in Mandarin) or (ii) if the reflexive is in an island. The distribution of IRE reflexives is unexpected because generally only monomorphemic reflexives behave this way (Pica 1987). Complex reflexives that behave this way, such as Malay diri-nya `himself/herself' (Cole & Hermon 2003) and Turkish kendi-sin `himself/herself' (Kornfilt 2001), are shown to have pronominal qualities. IRE reflexives do not have pronominal qualities since they exhibit Blocking Effects and island effects. Therefore, they are true long-distance reflexives. Blocking and island effects provide evidence that the reflexive undergoes raising to [Spec, CP], as is suggested for long-distance reflexives in other languages (e.g., Katada 1991). From the [Spec, CP] position, the reflexive is able to corefer with a nominal expression in a higher clause, in accordance with the Phase Impenetrability Condition (Chomsky 2001). Two processes are needed to account for the distribution of IRE long-distance reflexives (c.f., Cole & Wang 1996) since the set of expressions that are potential antecedents and the set of expressions that trigger Blocking Effects are not the same: a reflexive can corefer with a subject or an object, but only subjects trigger Blocking Effects. I posit that reflexives have a [VAR] feature that must be valued by a c-commanding nominal expression within the same phase via Agree, extending Hicks' (2009) analysis of English anaphors. Agree accounts for coreference and offers an inherent c-command relationship between the antecedent and reflexive. I account for Blocking Effects by considerably modifying Hasegawa's (2005) analysis for English anaphors. I suggest that a [+multi] feature on T licenses the reflexive and requires that the reflexive and the subject Agree for person.Item Number in classifier languages(2013-03) Nomoto, HirokiClassifier languages are often described as lacking genuine number morphology and treating all common nouns, including those conceptually count, as an unindividuated mass. This study argues that neither of these popular assumptions is true, and presents new generalizations and analyses gained by abandoning them.I claim that no difference exists between classifier and non-classifier languages regarding the semantics of either nouns or numerals. Common nouns universally denote properties and are individuated, contra Chierchia (1998). I argue that classifier languages in fact make the most fine-grained basic number distinction, i.e. a three-way distinction of `singular (SG) : plural (PL) : general (GN)'. Classifiers are analyzed as a sophisticated kind of singular number morphology. Classifier languages have genuine plural markers (Chung 2000). Importantly, I consider general number, which is associated with number-neutral properties, as a universally available basic number category, along with singular and plural. Optional number marking follows from the three-way distinction number system, where the general is morphologically unmarked. While classifier languages distinguish all basic number categories, non-classifier languages conflate one or more of them morphologically. Languages can be classified into five types according to this criterion: (i) SG : GN : PL, (ii) SG/GN : GN/PL, (iii) SG/GN : PL, (iv) SG : GN/PL, and (v) SG/GN/PL. The difference between classifier and non-classifier languages reduces not to semantics (Krifka 1995; Chierchia 1998; Wilhelm 2008) or syntax (Li 1999), but to a difference in number morphology. The proposed number system and typology make it possible to account for bare "singular" kind terms in type (ii) languages (e.g. Brazilian Portuguese), a problem to Dayal's (2004) theory of number and definiteness marking in kind terms.Item On Bipartite Negation(2019-07) Tilleson, PaulBipartite negation is the phenomenon in which two negators output to one instance of semantic negation. In this thesis I present an analysis of bipartite negation in Sgaw Karen, Ojibwe, and French, using original data from the former two languages and data from existing sources for French. I show that the negators in these languages differ with respect to clausal position, internal structure, meaning, and how the negators relate to each other. I argue that bipartite negation derives from either syntactic agreement or what I term NegP splitting, whereby two constituents in an extended projection of negation are merged in separate locations in the clause, similar to Poletto (2008) and de Clercq (2013). Sgaw Karen and French exhibit distinct variants of syntactic agreement. In Sgaw Karen, one negator is semantically uninterpretable and undergoes AGREE with the structurally lower interpretable negator, while in French both negators are interpretable goals for a structurally higher silent head responsible for imparting sentential negation. Ojibwe exhibits NegP splitting such that the sentential negator and a structurally higher negator are derived from a single extended projection of negation and are merged in two clausal positions. Both negators are interpretable for negation and cannot be in a syntactic agreement relation as I assume that only uninterpretable constituents initiate the AGREE operation. I present a framework of negation to explicate the functions of the negators in each language and to motivate why AGREE and NegP splitting are necessary to account for the range of facts on bipartite negation in these languages. Building on the work of de Clercq (2013), I argue that there are three classes of negators imparting contrary, contradictory, and focus negation respectively, each class having different internal structure. Each class of negator may merge in up to two distinct locations in the clausal spine, sentential negation being imparted by a contradictory negator merged in the TP domain. I show that dividing negators into classes based on meaning, internal structure, and clausal position has implications for the syntax of negative polarity emphasis, negative replies, and syntactic doubling outside of the domain of negation.Item Relativization in Ojibwe(2016-06) Sullivan, MichaelIn this dissertation, I compare varieties of Ojibwe and establish sub-dialect groupings for the larger grouping known as Southwestern Ojibwe, often referred to as Chippewa, an indigenous North American Indian language of the Algonquian family. Drawing from a vast corpus of both primary and archived sources, I present an overview of two strategies of relative clause formation and show that relativization appears to be an exemplary parameter in the grouping of Ojibwe dialect and sub-dialect relationships. Specifically, I target the morphological composition of participial verbs, known as participles in Algonquian parlance and show the variation of their form across a number of communities. In addition to the discussion of participles and their role in relative clauses, I present additional findings from my research, some of which seem to correlate with the geographical distribution of participles, most likely a result of historic movements of the Ojibwe people to their present location in the northern Midwestern region of North America. Following up on the previous dialect studies of Ojibwe primarily concerned with varieties of Ojibwe spoken in Canada (Nichols 1976; Rhodes and Todd 1981; Valentine 1994, to name a few), I present the first study of dialect variation for varieties spoken in the United States and along the border region of Ontario and Minnesota. By describing the data in a classic Algonquian linguistic tradition, I then recast the data in a modern theoretical framework, making use of previous theories for Algonquian languages (Bruening 2001; Brittain 2001) and familiar approaches such as feature checking (Chomsky 1993) and the Split CP Hypothesis (Rizzi 1997).