File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1061_metho.xml
Size: 19,504 bytes
Last Modified: 2025-10-06 14:14:12
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1061"> <Title>Using Discourse Predictions for Ambiguity Resolution</Title> <Section position="3" start_page="358" end_page="358" type="metho"> <SectionTitle> 2 System Description </SectionTitle> <Paragraph position="0"> The main modules of our system include speech recognition, parsing, discourse processing, and generation. Processing begins with tim speech input in the source language. The top best hypothesis of the speaker's utterance is then passed to Lhe parser. The GLR* parser (Lavie, 1995) produces a set of interlingua texts, or ILFs, for a given sentence. For robustness, the. (-ILI{,* parser can skil) words in the inpu/, sentence in order to find a partial parse for a sentence which otherwise would not be parsable. An 11:I' is a frame-based language, independent meaning reprcsen ration of a sentence. The main components of an 11:1' are the sl)eech act (e.g., suggest, accept, reject), the sentence type (e.g., state, query-J.g, fragraent), and the main semantic frame (e.g., Iree, busy).</Paragraph> <Paragraph position="1"> An example of an IUI' is shown in Figure 1. The parser may produce many Ilfl's for a single sentencej sometimes as many as one hundred or lnore.</Paragraph> <Paragraph position="3"> Sentence: 1 couhl do it; Wednesday niol'ning too.</Paragraph> <Paragraph position="4"> '\['he resulting set of llTs is then sent to the discourse processor. The discourse l)rocessor, based on I,ambert's work (\[,ambert and Carberry, 1992; I,ambert, 1993), disaml)iguates tile sl)eech act of e~(;h sentence~ normalizes temporal expressions (Y=oin context, and incorl)orates the seltt, enee into tile discourse context represented by a plan tree. The discourse l)roeessor also updal;es a calendar which keeps track of what the speakers h~we said M)out their schedules. We will discuss the discourse i>rocessor and how we extended it for the disambiguation task in Sectiou 4.</Paragraph> </Section> <Section position="4" start_page="358" end_page="359" type="metho"> <SectionTitle> 3 Ambiguity it, Enthusias(; </SectionTitle> <Paragraph position="0"> Because t;he spontalleous sche(\[lding dialogues 3,re unrestricted, ambiguity is a major problenl in Enthusiast. We gange ambiguities in terms of differences between members of the set of ILTs produced by the parse, r for the sail~|e source sentence.</Paragraph> <Paragraph position="1"> As we mentioned e, arlier, the disaanbiguation task benelits from both non (-ontexL- and context-l)~sed methods. We observed that some classes of ambiguities can be more l)erspieuously dealt with in one way or the other.</Paragraph> <Section position="1" start_page="358" end_page="358" type="sub_section"> <SectionTitle> 3.1 Non Context-Based Disambiguation </SectionTitle> <Paragraph position="0"> When the parser produces more than one IlJl' for a single sentence, it scores these ambiguities according to three diti'e.rent non context-based disaml)iguation inethods. The first method, based on (Carroll and Briscoc, 1993), assigns probal)ilities to actions in the (~I,R,* l)arser's 1)arse table. The probabilities of the parse actions induce st,atistical scores on alternative parse trees, which are then used for parse disambiguation. The resuiting score is called the slalislical score. The second method the parser uses to score the II/l's makes use of penalties mammlly assigned to different rules in the l)arsing grammar, rl'he resulting score from this method is called the gr'ammar pr'cfercucc score. The third score, called the parser score, is a heuristic combination of the previous two scores ldUS other information such as the number of words skil)ped. These three llOll context-based scores will be referred to later when we discuss comt)ining non eontext-I)ased l)redict, ions with context-based ones.</Paragraph> <Paragraph position="1"> Error analysis of parser disambiguation output shows that the C, IA{* parser handles well ambiguities which are not strongly dependent upon the context for a reasonable interpretation, laBr ex-ample, the Sl)anish word uua can mean either ouc or a, as an indefinite reference. The parser always chooses the indelinite reference meaning since the vast, majority of training examples use this sense of the word. Moreover, since in this case incorrect disambiguation does not adversely affect translation quality, it; ramies sense to handle this ambiguity in a purely non context-based manner.</Paragraph> </Section> <Section position="2" start_page="358" end_page="359" type="sub_section"> <SectionTitle> 3.2 Context-Based Disambiguation </SectionTitle> <Paragraph position="0"> While a broad range of ambiguities can I)e hal> died well in ~ non context-basel\] manner, some ambiguities must be treated in a contexl, se, nsi tive manner in order to be translated correctly.</Paragraph> <Paragraph position="1"> Table 1 lists some examples of these tyt)es of atn-biguities. Each type of ambiguity is categorized by COml)aring either difl'erent slots in alternative ll;l's or dilt'erenL values in ambiguous II2F slol.s given \[;he same input utteran(;e.</Paragraph> <Paragraph position="2"> For example, one. type o1&quot; ambiguity l)est hat> dh'd with ~ contextd)ase(I approactl is the day vs hour ~md)iguity, exenq)lified by tim phrase dos a cua&v. It can mean either Ihc second al J'o'a% lhc second lo the Jburlh or lwo go four. Out of conte.x|., it is iml)ossil)le to tell which is the I)cst intert)retation. (~ontextua.l inlk)rmation makes il; possible to choose the correct interpreLal, ion. IC/or (;xaml)le, if l,h(: sl)eakers are trying to estal)lish a dab: when they can meet,, then the sccoud to the Jourlh is t;hc most liD~ly itd;erl)retatiotJ. Itowcver,</Paragraph> </Section> <Section position="3" start_page="359" end_page="359" type="sub_section"> <SectionTitle> Types of Ambiguity Description </SectionTitle> <Paragraph position="0"> day vs hour a temporal expression can be recognized as a (lay or all hour state vs qaery-lf ambiguity between sentence type state or query-if speaker reference ambiguity between pro-drop pronouns tense ambiguity between past tense and present tense how vs greet ambiguity between frame how and greet when vs where ambiguity between when slot and where slot Exalnples dos a cuatro second at four or second to fourth or two to ,four est~ bien It's OK or \[s it OK? tambidn podr\[a ese d\[a also i could that day or also you could that day d6nde nos encontramos where are we meeting or where were we meetinq qu~ tal How are you? or How is that? sPSbado quince Saturday the fifteenth or Saturday building 15 if the speakers have already chosen a date and are negotiating the exact time of the meeting, then only the meaning two to four makes sense. Some sentence type ambiguities are also context-based. For example, l'Sstd bien can be either the statement It is good or the question Is it good?. This is an example of what we call the state vs query-i:f ambiguity: in Spanish, it is impossible to tell out of context, and without information about intonation, whether a sentence is a statement or a yes/no question. However, if the same speaker has just made a suggestion, then it is more likely that the speaker is requesting a response from the other speaker by posing a question. ht contrast, if the previous speaker has just made a suggestion, then it is more likely that the current speaker is responding with an accepting statement than posing a question. In generM, we base our context-based predictions for disambiguation on turn-taking information, the stage of negotiation, and the speakers' cMendar information. This information is encoded in a set of context-based scores produced by the discourse processor for each ILT.</Paragraph> </Section> </Section> <Section position="5" start_page="359" end_page="361" type="metho"> <SectionTitle> 4 Discourse Processing and </SectionTitle> <Paragraph position="0"> Disambiguation Context-based ranking of ambiguities is performed by the plan-based discourse processor described in (Rosd et aL., 1995) which is based on (Lambert and Carberry, 1992; Lambert, 1993).</Paragraph> <Paragraph position="1"> OriginMly, our discourse processor took as its input the single best parse returned by the parser. q'he main task of the discourse processor was to relate that representation to the context, i.e., to the plan tree. In generaL, plan inference starts from the surface \[brms of sentences. Then speech-acts are inferred. Multiple speech-acts can be inferred for one ILT. A separate inference chain is created for each potential speech act performed by the associated ILT. Preferences for picking one inference chain over another were determined by the focusing heuristics, which provide ordered expectations of discourse actions given the existing plan tree. Our focusing heuristics, described in detail in (l{os6 et al., 1995), arc an extension of those described in (Lambert, 1993). In determining how the inference chain attaches to the plan tree, the speech-act is recognized, since each inference chain is associated with a single speech-act. As mentioned in the introduction, for a plan-based disconrse processor to deal with ambiguities, three issues need to be addressed: 1. The discourse processor must be able to deal with more than one semantic representation as input at a time. Note that simply extending the discourse processor to accept multiple ILTs is not the whole solution to the disambiguation problem: finer distinctions must be made in terms of coherence with the context in order to produce predictions detailed enough to distinguish between alternative LLTs.</Paragraph> <Paragraph position="2"> 2. Before context-based predictions can be combined with quantitative non context-based predictions, they must be quantified, it was necessary to add a mechanism to produce more detailed quantifiable predictions than those produced by the original focusing heuristics described in (Ros6 et al., 1995).</Paragraph> <Paragraph position="3"> 3. Finally, context-based predictions must be combined successfully with non-context-based ones. The discourse processor must be able to weigh these various predictions in of der to determine which ones to believe in specific circumstances.</Paragraph> <Paragraph position="4"> Thus, we extended our original discourse processor as follows. It takes multiple ambiguous lI,Ts fi'om the parser and computes three quantified discourse scores for each ambiguity. The discourse scores are derived by taking into accotmt attachment preferences to the discourse tree, as reflected by two kinds of focusing scores, and |,he score returned by the .qradcd conslrainls, a new type of constraint we introduced. Then for each ambiguity the discourse processor combines these three kinds of context-based scores with the non context-based scores l)roduced by other modules of the system to make tire final choice, and returns the chosen IUI'. As in the first version of the discourse processor, the chosen I I,T is attached to the plan tree and a speech act is assigned to it. We discuss now how the discourse scores are derived.</Paragraph> <Paragraph position="5"> Note that lower wdues for all scores are preferred.</Paragraph> <Section position="1" start_page="360" end_page="360" type="sub_section"> <SectionTitle> 4.1 Focusing scores </SectionTitle> <Paragraph position="0"> The focusing scores are derived from focusing heuristics based Ott (Sidner, 198l; l,ambert, 199:f; Rosd et al., 1995). The focusing heuristics identify the most coherent relationship between a new inference chain and the discourse |)Inn tree. Atl,ach meat preferences by the Focusing heuristics are translated into numerical preference scores based on attachment positions and the length of the in-ference chains. The assignment of focusing scores reflects the assumption thai, the ntost coherent move in a diMogue is to continue the most salient focused actions, namely, the ones on the rightfl,ost frontier of the plan tree. The first feet(sing score is a boolean focusing fla(l. It returns 0 if the inference chain for the associated 11,'1' attaches t,o the rightmost fl'outier of the plan tree, 1 if it either attaches to the tree but trot to tit(.', right frontier or doesn't attach to the tree. The second focusing score, the J'ocusing score i)roper, assigns a score between 0 and t indicating \[tow far up the right-most frontier the inference chain attaches. The maximal score is assigned in the case that the inference chain does not attach.</Paragraph> </Section> <Section position="2" start_page="360" end_page="360" type="sub_section"> <SectionTitle> 4.2 Graded constraints </SectionTitle> <Paragraph position="0"> Once the. discourse processor was extended to accept multiple ILTs as input, it became clear that Ibr most ambignous parses the original focusing heuristics did not provide enough information to distinguish among the alternatives. Our sohttion was to modity the discourse processor's constraint processing mechanism, making it possible to bring more domain knowledge to bear on the disambiguation task. In the original discourse processor, all of the constraints on plan operators, which we (:all elimination constraints, were used solely \[or the purpose of binding w~riables and eliminating certain inference possibilities. Their purpose was to eliminate provably wrong inferences, and it, this way to give the focusing heuristics a higher likelihood of selecting the torte.c( inference chain from the remaining set.</Paragraph> <Paragraph position="1"> We introduced a different type of constraint, graded conslraints, inspired by the concept of graded unification discussed it, (Kim, 1994). Or,like elimination constraints, they neither bind variables not&quot; eliminate any inferences. Graded constraints always return true, so they cannot eliminate inferences. However, they assign numerical penalties or preferences to inference chains based on domain specific information. This information is then used to rank the set of possible inferences Left after the elimination constraints are I)rdegcessed.</Paragraph> <Paragraph position="2"> For example, consider the day versus hour ambiguity we discussed earlier. In most cases inference chains for Ilfl's with this ambiguity have tit(; same focusing scores. We introduce the possible-time constrMnt to (he.ok whether the temporal constraints conflict with the dynamic calendar or the recorded dialogue (late when the inference chains are built. If the temporal information represented in an II,T is in conflict with the dialogue record date (e.g., scheduling a time before the record date) or with the temporal constraints already in the calendar (e.g., propose a time that is ah'eady rqiected), a penalty score is assigned to that inference chain; otherwise, a default value (i.e. no penalty) is returned. Several graded constraints may be fired in one inference chain. Penalties or preferences for all graded constraints in the inference chain are summed together. 'Phe result is the graded constraint score for that ambiguity.</Paragraph> <Paragraph position="3"> Introducing graded constraints has two adwm-tages over adding more elimination constraints.</Paragraph> <Paragraph position="4"> As far as tile systetn in ge, neral is COlmerned, graded constraints only give preferences, they do not rule out inferencing and attachment possibilities: thtls, introducing new constraints will not damage the broad coverage of the system. As far as the discourse processor is concerned, it; would be possible to achieve the same effect by adding more elimination constraints, but this wouht make it, necessary to introduce more fine-tuned plan operators geared towards specilic cases. By introducing graded constraints we avoid expanding the search space among the plan operators.</Paragraph> </Section> <Section position="3" start_page="360" end_page="361" type="sub_section"> <SectionTitle> 4.3 Combining Predict, ions </SectionTitle> <Paragraph position="0"> Once the information from the graded constraints and the focusing scores is awdlable, the challenging problem of combining these context-based predictions with tile non context-based ones arises.</Paragraph> <Paragraph position="1"> We experimented with two methods of automat-really learning functions for combining our six scores into one composite score, namely a genetic progranmfing approach and a neural net approach. The basic assumption of our disambigua~ tion approach is that the context-based attd non context-based scores provide different perspectives on the disambiguation task. They act together, each specializing in different types of cases, to constrain the final result. Thus, we want our learning approach to learn not only which factors are important, but also to what extent they are important, and under what circumstances. The genetic progranlming and neural net approaches are ideal in this respect.</Paragraph> <Paragraph position="2"> Genetic programming (Koza, 1992; Koza, 1994) is a method for &quot;evolving&quot; a program to accomplish a particular task, in this case a flmction for computing a composite score. This technique can learn functions which are efficient and humanly understandable and editable. Moreover, because this technique samples different parts of the search space in parallel, it avoids to some extent the problem of selecting locally optimal solutions which are not globally optimal.</Paragraph> <Paragraph position="3"> Connectionist approaches have been widely used \['or spoken language processing and other areas of computational linguistics, e.g., (Wermpter, 1994; Miikkulainen, 1993) to name only a few.</Paragraph> <Paragraph position="4"> Connectionist approaches are able to learn the structure inherent in the input data, to make fine distinctions between input patterns in the presence of noise, and to integrate difl'erent information sources.</Paragraph> <Paragraph position="5"> We refer the reader to (l{osd and Qu, 1995) for fall details about the motivations underlying the choice of these two methods as well as the advantages and disadvantages of each.</Paragraph> <Paragraph position="6"> both kinds of testing are the same becanse cumulative error is only an issue for context-based approaches.</Paragraph> <Paragraph position="7"> Our results show that the discourse processor is indeed making nsefld predictions for disambiguation: when we abstract away the problem of cumulative error, we can achieve an improvement of 13% with the genetic programming approach and of 2.5% with the neural net approach over the parser's non-context based statistical disambiguatiou technique. For example, we were able to achieve almost perfect performance on the state vs query-if ambiguity, missing only one case with the genetic programming approach; thus, for this ambiguity, we can trust the discourse processor's prediction.</Paragraph> <Paragraph position="8"> However, our results also indicate that we have not solved the whole problem of combining non context- and context-based predictions for disambiguation. \[n the face of cumulative error, both of the two discourse combination approaches suffer fl'om performance degradation, though to a different extent. Our current direction is to seek a solution to the cumulative error problem. Some preliminary results in this regard are discussed in (Qu et al., 1996).</Paragraph> </Section> </Section> class="xml-element"></Paper>