File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/w94-0101_intro.xml
Size: 17,502 bytes
Last Modified: 2025-10-06 14:05:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W94-0101"> <Title>Qualitative and Quantitative Models of Speech Translation</Title> <Section position="3" start_page="0" end_page="3" type="intro"> <SectionTitle> 2. Qualitative and Quantitative Models </SectionTitle> <Paragraph position="0"> One contrast often taken for granted is the identification of a 'statistical-symbolic' distinction in language processing as an instance of the empirical vs. rational debate. I believe this contrast has been exaggerated though historically it has had some validity ill terms of accepted practice. Rule based approaches have become more empirical in a number of ways: First, a more empirical approach is being adopted to grammar development whereby the rule set is modified according to its performance against corpora of natural text (e.g.</Paragraph> <Paragraph position="1"> Taylor, Grovel and Briscoe 1989). Second, there is a class of techniques for learning rules from text, a recent example being Brill 1993. Conversely, it is possible to imagine building a language model in which all probabilities are estimated according to intuition without reference to any real data, giving a probabilistic mod~,l that is not empirical.</Paragraph> <Paragraph position="2"> Most language processing labeled as statistical involves associating real-number valued parameters to configurations of symbols. This is not surprising given that natural language, at least in written form, is explicitly symbolic. Presumably, classifying a system as symbolic must refer to a different set of (internal) symbols, but even this does not rule out many statistical systrrrls modeling events involving nonterminal categories and word senses. Given that the notion of a symbol, let. alone an 'internal symbol', is itself a slippery one, it may he unwise to build our theories of language, or even tl,. way we classify different theories, on this notion.</Paragraph> <Paragraph position="3"> Instead, it would seem that the real contrast driving the shift towards statistics in language processing is a contrast between qualitative systems dealing exclusively with combinatoric constraints, and quantitative systems that involve computing numerical functions. This bears dir~.ctly on the problems of brittleness and complexity that discrete approaches to language processing share wll,ll, for example, reasoning systems based on traditional logical inference. It relates to the inadequacy of the dominant theories in linguistics to capture 'shades' of meaning or degrees of acceptability which are often recognized by people outside the field as important inherent properties of natural language. The qualitative-quantitative distinction can also be seen as underlying the difference between classification systems based on I'cature specifications, as used in unification formalisms (Shicber 1986), and clustering based on a variable degr~,e of granularity (e.g. Pereira, Tishby and Lee 1993).</Paragraph> <Paragraph position="4"> It seems unlikely that these continuously variable aspcct:s of fluent natural language can be captured by a purely combinatoric model. This naturally leads to the qtwstion of how best to introduce quantitative modeli,g into language processing. It is not, of course, nec,,ssary for the quantities of a quantitative model to be probabilities. For example, we may wish to define real-valued functions on parse trees that reflect the extent to which the trees conform to, .say, minimal attachment and parallelism between conjuncts. Such functions have been used in tandem with statistical functions in experiments on disambiguation (for instance Alshawi and (',a.rter 1994). Another example is connection strengths i, m~ural network approaches to language processing, th,mgh it. has been shown that certain networks are ~,tfectively computing probabilities (Richard and Lippmann 1991).</Paragraph> <Paragraph position="5"> Nevertheless, probability theory does offer a coherent and relatively well understood framework for selecting between uncertain alternatives, making it a natural choice for quantitative language processing. The case f.r probability theory is strengthened by a well devel,,p-d empirical methodology in the form of statistical I,:~ramet.ccr estimation. There is also the strong connecl i,,n between probability theory and the formal theory .1&quot; i.formation and communication, a connection that has been exploited in speech recognition, for example I~qing tim concept of entropy to provide a motivated way ,.f measuring the complexity of a recognition problem (.h'lim'k et ai. 1992).</Paragraph> <Paragraph position="6"> I&quot;,v,'n if probability t|wory remains, as it currently is, th,, m~.l.llod of clloicc in making language processing qu.ntitative, this still h~aw:s the fieht wide open in terms .,f carving up languag~ processing into an appropriate set ,,f ,,wmts tbr probability theory to work with. For translation, a very direct apprgach using parameters based on surface positions of words in source and target sentences was adopted in the Candide system (Brown et at. 1990). However, this does not capture important structural properties of natural language. Nor does it take into account generalizations about translation that are independent of the exact word order in source and target sentences. Such generalizations are, of course, central to qualitative structural approaches to translation (e.g. Isabelle and Macklovitch 1986, Alshawi et at. 1992).</Paragraph> <Paragraph position="7"> The aim of the quantitative language and translation models presented in sections 5 and 6 is to employ proba~ bilistic parameters that reflect linguistic structure without discarding rich lexical information or making the models too complex to train automatically. In terms of a traditional classification, this would be seen as a 'hybrid symbolic-statistical' system because it deals with linguistic structure. From our perspective, it can be seen as a quantitative version of the logic-based model because both models attempt to capture similar information (about the organization of words into phrases and relations holding between these phrases or their referents), though the tools of modeling are substantially different.</Paragraph> <Paragraph position="8"> 3. Dissecting a Logic-Based System We now consider a hypothetical speech translation system in which the language processing components follow a conventional qualitative transfer design. Although hypothetical, this design and its components are similar to those used in existing database query (Rayner and Alshawi 1992) and translation systems (Alshawi et al 1992). More recent versions of these systems have been gradually taking on a more quantitative flavor, particularly with respect to choosing between alternative analyses, but our hypothetical system will be more purist in its qualitative approach.</Paragraph> <Paragraph position="9"> The overall design is as follows. We assume that a speech recognition subsystem delivers a list of text strings corresponding to transcriptions of an input utterance. These recognition hypotheses are passed to a parser which applies a logic-based grammar and lexicon to produce a set of logical forms, specifically formulas in first order logic corresponding to possible interpretations of the utterance. The logical forms are filtered by contextual and word-sense constraints, and one of them is passed to the translation component. The translation relation is expressed by a set of first order axioms which are used by a theorem prover to derive a target language logical form that is equivalent (in some context) to the source logical form. A grammar for tile target language is then applied to the target form, generating a syntax tree whose fringe is passed to a speech synthesizer.</Paragraph> <Paragraph position="10"> &quot;Faking the various components in turn, we make a note of undesirable properties that might be improved by quantitative modeling.</Paragraph> <Section position="1" start_page="2" end_page="3" type="sub_section"> <SectionTitle> Analysis and Generation </SectionTitle> <Paragraph position="0"> A grammar, expressed as a set of syntactic rules (axioms) Gsv, and a set of semantic rules (axioms)Gsem is used to support a relation form holding between strings s and logical forms C/ expressed in first order logic: a.y. u a,.m f o m( s, C/). &quot; The relation form is many-to-many, associating a string with linguistically possible logical form interpretations. In the analysis direction, we are given s and search for logical forms C/, while in generation we search for strings s given C/.</Paragraph> <Paragraph position="1"> For analysis and generation, we are treating strings s and logical forms C/ as object level entities. In interpretation and translation, we will move down from this meta-level reasoning to reasoning with the logical forms as propositions.</Paragraph> <Paragraph position="2"> The list of text strings handed by the recognize/to the parser can be assumed to be ordered in accordance with some acoustic scoring scheme internal to the recognizer. The magnitude of the scores is ignored by our qualitative language processor; it simply processes the hypotheses one at a time until it finds one for which it can produce a complete logical form interpretation that passes grammatical and interpretation constraints, at which point it discards the remaining hypotheses.</Paragraph> <Paragraph position="3"> Clearly, discarding the acoustic score and taking the first hypothesis that satisfies the constraints may lead to an interpretation that is less plausible than one derivable from a hypothesis further down in the recognition list. But there is no point in processing these later hypotheses since we will be forced, to select one interpretation essentially at random, Syntax The syntactic rules in Gsv. relate 'category' predicates co, ct, c2 holding of a string and two spanning substrings (we limit the rules here to two daughters for simplicity):</Paragraph> <Paragraph position="5"> (Here, and subsequently, variables like so and st are implicitly universally quantified.) G~v,~ also includes lexical axioms for particular strings w consisting of single words: el(w), ...</Paragraph> <Paragraph position="6"> For a feature-based grammar, these rules can include conjuncts constraining the values, al,a~,..., of discrete-valued functions f on the strings:</Paragraph> <Paragraph position="8"> The main problem here is that such grammars have no notion of a degree of grammatical acceptability - a sentence is either grammatical or ungrammatical. For small grammars this means that perfectly acceptable strings are often rejected; for large grammars we got a vast number of alternative trees so the chance of seh'cting the correct tree for simple Nell{.CllCes C;tll gel. worso ~Lg the gralnmar cow'rago increas,,s. '\['hcre is also tl,.</Paragraph> <Paragraph position="9"> problem of requiring increasingly comph,x feature sets to describe idiosyncrasies in the lexicon.</Paragraph> <Paragraph position="10"> Semantics Semantic grammar axioms belonging to Gsem specify a 'composition' function g for deriving a logical form for a phrase from those for its subphrasos: form(so, g(C/t, C/2)) daughters(so, st, s2)Acj (st)Ac2(s2)Acl~(s0) A form(sl, el) A form(s2, C/2) The interpretation rules for strings l)ottom out ill a set of lexical semantic rules associating words with predicates (pl,P2,...) corresponding to 'word senses'. For a particular word and syntactic category, there will bo a (small, possibly empty) finite set of such word sense predicates:</Paragraph> <Paragraph position="12"> First order logic was assunmd as the semantic representation language because it comes with well understood, if not very practieM, inferential machinery for constraint solving. However, applying this machinory requires making logical forms fine grained to a degroe often not warranted by the information the speaker of an utterance intended to convey. An example of this is explicit scoping which leads (again) to large numlmrs of alternatives which the qualitative model has difliculty choosing between. Also, many natural language sentences cannot be expressed in first order logic without resort to elaborate formulas requiring complex semantic composition rules. These rules can be simplilied by using a higher order logic but at the expense of cw.n less practical inferential machinery.</Paragraph> <Paragraph position="13"> In applying the grammar in generation we are faced with the problem of balancing over and undergeneration by tweaking grammatical constraints, there being no way to prefer fully grammatical target sentences over more marginal ones. Qualitative approaches to grammar tend to emphasize the ability to capl, uro generalizations as the main measure of success in linguistic modeling. This might explain why producing appropriate lexical collocations is rarely addressed seriously in these models, even though lexical collocations are important for fluent generation. '/'he study of collocations for generation fits in more naturally with sl.atistical techniques, as illustrated by Smajda and McKeown (1990).</Paragraph> <Paragraph position="14"> Interpretation In the logic-based model, interpretation is the process of identifying from the possible interpretations ~ of s for which form(s, qt) hold, ones that are consistent with the ,',,m~,xt of interpretation. We can state this as follows: /f U.~'U A ~ O.</Paragraph> <Paragraph position="15"> Ih.r,., we haw~ separated the context into a contingent s,,I ,ff contextual propositions S and a set R of (monol i ngual) 'meaning postulates', or selectional restrictions, that constrain the word sense predicates in all contexts. .1 is a set of assumptions sufficient to support the in-I,'rl)n'lation C/ given S and R. In other words, this is h,~crl)rctal, ion as abduction' (Itobbs et al. 1988), since ~!)(i,('lion, not deduction, is needed to arrive at the :~>.'d H II I~tiOIIS ,4.</Paragraph> <Paragraph position="16"> 'l'h(&quot; ,host common types of meaning postulates in R art, t h,,s~&quot; for restriction, hyponymy, and disjointness,</Paragraph> <Paragraph position="18"> Although there are compilation techniques (e.g. Mellish 19~) which allow sclectional constraints stated in this fashion to be implemented efficiently, the scheme i~ I,rol)lematic iu other respects. To start with, the ass~t~ttl~l ion of a small set of senses for a word is at best ;~wkward because it is difficult to arrive at an optimal gra,ularity for sense distinctions. Disambiguation with s,qcctionai restrictions expressed as meaning postulates is also prol)lematic because it is virtually impossible to ,levis, a set of postulates that will always filter all but ,,t,, alt.crnative. We are thus forced to under-filter and make an arbitrary choice between remaining alternatives. null Logic based translation In hoth the quantitative and qualitative models we take a t ransfi~r approach to translation. We do not depend .!~ im.('rlingual symbols, but instead map a representa-I i,:)n with constants associated with the source language inlx) a corresponding expression with constants from the l ar~ct language. For the qualitative model, the operahh, notion of correspondence is based on logical equivahql('e and the constants are source word sense predicates I'1, t&quot;-' .... and target sense predicates ql, q2, .... More specifically, we will say the translation relation hH we~,n a source logical form Cs and a target logical i;,r~t 6t holds if we have /~ u .'~' u A' ~ (q~., ~ ~,) wh,.n, I~ is a s~.t of monolingual and bilingual mean-I J;:. i,t).~l.ulal.es, and ,S' is ;t set of formulas characterizing I.h*' ~'lli'l','llt COllt~xt. .'l I is a s,,t of assumptions that in,h=,h's I.h,' assunlptions A which SUl)ported ~bs. ilere I,ili,,~ual me;ruing i~osl.ulal.~.s a.re first order axioms rehll.ing source and target sense predicates. A typical I,ilin~ual posl.ulate Ibr translal.ing between Pl an(I ql ii~it~;lil h,, of th,. for,n:</Paragraph> <Paragraph position="20"> The need for the assumptions A' arises when a source language word is vaguer that its possible translations in the target language, so different choices of target words will correspond to translations under different assumptions. For example, the condition ps(xl) above might be proved from the input logical form, or it might need to be assumed.</Paragraph> <Paragraph position="21"> In the general case, finding solutions (i.e. A', ~bt pairs) for the abductive schema is an undecidable theorem proving problem. This can be alleviated by placing restrictions on the form of meaning postulates and input formulas and using heuristic search methods. Although such an approach was applied with some success in a limited-domain system translating logical forms into database queries (Rayner and Alshawi 1992), it is likely to be impractical for language translation with tens of thousands of sense predicates and related axioms.</Paragraph> <Paragraph position="22"> Setting aside the intractability issue, this approach does not offer a principled way of choosing between alternative solutions proposed by the prover. One would like to prefer solutions with 'minimal' sets of assumptions, but it is difficult to find motivated definitions for this minimization in a purely qualitative framework.</Paragraph> </Section> </Section> class="xml-element"></Paper>