File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1022_metho.xml
Size: 28,677 bytes
Last Modified: 2025-10-06 14:14:04
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1022"> <Title>Theory and practice of ambiguity labelling with a view to interactive disambiguation in text and speech MT</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> Mutsuko Tomokiyo ATR Interpreting Telecommunications Research Labs </SectionTitle> <Paragraph position="0"> tomokiyo@itl, atr. co. jp analyzers were available, in others not. It is interesting to compare the intuition of the human labeller with results actually produced: most of the time, differences may be attributed to the fact that available analyzers don't yet match our expectations for &quot;state of the art&quot; analyzers, because they produce spurious, &quot;parasite&quot; ambiguities, and don't yet implement all types of sure linguistic constraints.</Paragraph> </Section> <Section position="4" start_page="0" end_page="119" type="metho"> <SectionTitle> 1 Motivations and Goals </SectionTitle> <Paragraph position="0"> Interactive disambiguation technology must be developed in the context of research towards practical Interpreting Telecommunications systems as well as high-quality multitarget text translation systems. In the case of speech translation, this is because the state of the art is such that a black box approach to spoken language analysis (speech recognition plus linguistic parsing) is likely to give a correct output for no more than 50 to 60% of the utterances (&quot;Viterbi consistency&quot; \[2\]) l, while users would presumably require an overall success rate of at least 90% to be able to use such systems at all. However, the same spoken language analyzers may be able to produce sets of outputs containing the correct analysis in about 90% of the cases (&quot;structural consistency&quot; \[2\]) 2 . In the remaining cases, the system would be unable to analyze the input, or no output would be correct.</Paragraph> <Paragraph position="1"> Further extralinguistic and sure disambiguation may be performed (1) by an expert system, if the task is constrained enough; (2) by the users (author or speakers), through interactive disambiguation; and (3) by a (human) expert translator or interpreter, accessible through the network. For example, an expert interpreter &quot;monitoring&quot; several bilingual conversations could solve some ambiguities from his workstation, either because the system decides to ask him first, or 1 According to a study by Cohen & Oviatt, the combined success rate (SR) is bigger than the product of the individual success rates by about 10% in the middle range.</Paragraph> <Paragraph position="2"> 50~60% overall Viterbi constitency corresponds then to 65~75% individual success rate, which is optimistic.</Paragraph> <Paragraph position="3"> 2 According to the preceding table, this corresponds to a structural consistency of 95% for each component, which seems impossible to achieve by strictly automatic means in practical applications involving general users.</Paragraph> <Paragraph position="4"> because he sees it on his screen and steps in. In cases where users could not achieve satisfactory results by using (and helping) the system, the human expert would take charge of (part ot) the translation.</Paragraph> <Paragraph position="5"> We suppose an architecture flexible enough to allow the above three extralinguistic processes to be optional, and, in the case of interactive disambiguation, to allow users to control the amount of questions asked by the system. Hence, some ambiguities may remain after extralinguistic disambiguation. They should be solved by the system heuristically and &quot;unsurely&quot;, by using preferences, scores or defaults. In that case, it is important that the questions asked from the users are the most crucial ones, so that failure of the last step to select the correct interpretation does not result in too damaging translation errors.</Paragraph> <Paragraph position="6"> The questions we want to study on &quot;ambiguity labelled&quot; dialogues and texts are the following: * what kinds of ambiguities (unsolvable by state-of-the-art speech and text analyzers) are there in real data to be handled by the envisaged systems? * what are the possible methods of interactive disambiguation, for each ambiguity type ? * how can a system determine whether it is important or not for the overall communication goal to disambiguate a given ambiguity ? * what kind of knowledge is necessary to solve a given ambiguity, or, in other words, whom should the system ask: the user, the interpreter, or the expert system, if any? deg in a given dialogue or document, how far do solutions to ambiguities carry over.&quot; to the end of the piece, to a limited distance, or not at all? Ambiguity labelling should not be performed with reference to any particular analyzer, even if a good one is available. It should be done at a less specific level, suitable for generating disambiguation dialogues understandable by non-specialists. For example, attachment ambiguities are represented differently in the outputs of various analyzers, but it is always possible to recognize such an ambiguity, and to explain it by using a &quot;skeleton&quot; flat bracketing. Ambiguity labelling may also be considered as part of the specification of present and future state of the art analyzers, which means that: it should be compatible with the representation systems used by the actual or intended analyzers.</Paragraph> <Paragraph position="7"> it should be clear and simple enough for linguists to do the labelling in a reliable way and in a reasonable amount of time.</Paragraph> <Paragraph position="8"> Finally, our labelling should only be concerned with the final result of analysis, not in any intermediate stage, because we want to retain only ambiguities which would remain unsolved after the complete automatic analysis process has been performed.</Paragraph> </Section> <Section position="5" start_page="119" end_page="122" type="metho"> <SectionTitle> 2 Representations, Ambiguities and Associated Notions </SectionTitle> <Paragraph position="0"> Even if we want to label ambiguities independently of any specific analyzer, we must have in mind a certain class of possible representation systems for analysis results, and to be clear about what an &quot;ambiguous representation&quot; is and about what counts as an ambiguity, etc.</Paragraph> <Paragraph position="1"> What is an &quot;ambiguous representation&quot;? This question is not as trivial as it seems, because it is often not clear what we exactly mean by &quot;the&quot; representation of an utterance. In the case of a classical context-free grammar G, shall we say that a representation of U is any tree T associated to U via G, or that it is the set of all such trees? Usually, linguists say that U has several representations with reference to G.</Paragraph> <Paragraph position="2"> But if we use f-structures with disjunctions, U will always have one (or zero!) associated structure S.</Paragraph> <Paragraph position="3"> Then, we would like to say that S is ambiguous if it contains at least one disjunction. Returning to G, we might then say that &quot;the&quot; representation of U is the disjunction of all trees T associated to U via G.</Paragraph> <Paragraph position="4"> In practice, however, developers prefer to use hybrid data structures to represent utterances. Trees decorated with various types of structures are very popular. For speech and language processing, lattices bearing such trees are also used, which means at least 3 levels at which a representation may be ambiguous.</Paragraph> <Paragraph position="5"> Which class of representation systems do we consider in our labelling? First, they must be fine-grained enough to allow the intended operations. For instance, text-to-speech requires less detail than translation. On the other hand, it is counter-productive to make too many distinctions. For example, what is the use of defining a system of 1000 semantic features if no system and no lexicographers may assign them to terms in an efficient and reliable way? Second, there is a matter of taste and consensus: although different representation systems may be formally equivalent, researchers and developers have their preferences. Third, the representations should be amenable to efficient computer processing. Let us make this point more precise.</Paragraph> <Paragraph position="6"> A &quot;computable&quot; representation system is a representation system for which a &quot;reasonable&quot; parser can be developed.</Paragraph> <Paragraph position="7"> A &quot;reasonable&quot; parser is a parser such as: * its size and time complexity are tractable over the class of intended utterances; * assumptions about its ultimate capabilities, especially about its disambiguation capabilities, are realistic given the state of the art.</Paragraph> <Paragraph position="8"> A representation will be said to be ambiguous if it is multiple or u nderspec~fied.</Paragraph> <Paragraph position="9"> In all known representation systems, it is possible to define &quot;proper representations&quot;, extracted from the usual representations, and ambiguity-free. For example, if we represent &quot;we read books&quot; by the unique decorated dependency free:</Paragraph> <Paragraph position="11"> (tense {pres past\])...) \[&quot;books&quot; ( (lex &quot;book-N&quot;) (cat noun)...) \] \] there would be 2 proper representations, one with ( tense pres ) , and the other with ( tense past).</Paragraph> <Paragraph position="12"> For defining the proper representations of a represem tation system, it is necessary to specify which disjunctions are exclusive, and which are inclusive. A representation in a formal representation system is proper if it contains no exclusive disjunction.</Paragraph> <Paragraph position="13"> The set of proper representations associated to a representation R, is obtained by expanding all exclusive disjunctions of R (and eliminating duplicates). It is denotexl hem by_ProA)er(_R ~ .....</Paragraph> <Paragraph position="14"> R is multiple if IProper(R)l>l. R is multiple if (and onlzi~n~m_per. _ _ _ _ A proper representation P is undersT)ecified if it is undefined with respect to some necessaryinformation. There are two cases: the intbrmation is specified, but its value is unknown, or it is nfissing altogether.</Paragraph> <Paragraph position="15"> The first case often happens in the case of anaphoras: (ref ?) , or in the case where some information has not been exactly computed, e.g. ( taskdomain ? ) , \[decade of month ?) , but is necessary for translating in at least one of tile target languages. It is quite natural to consider this as ambiguous. For example, an anaphoric reference should be said to be ambiguous * if several possible referents appear in the representation (several proper representations), * and also if the referent is simply marked as unknown, which causes no disjunction.</Paragraph> <Paragraph position="16"> The second case nmy never occur in representations where all attributes are present in each decoration. But, in a standard f-structure, one cannot force tile presence of an attribute, so that a necessary attribute may be missing: (ref .9) means the absence of attribute ref.</Paragraph> <Paragraph position="17"> 1&quot;or any \[brmal representation system, then, we must specify what the &quot;necessary information&quot; is. Contrary to what is needed for defining Proper(R), this may wiry with the intended application.</Paragraph> <Paragraph position="18"> Our final definition is now simple to state.</Paragraph> <Paragraph position="19"> A representation R is ambiguous if it is multiple or~f \] eper(R ) contains an underspecified P.</Paragraph> <Paragraph position="20"> We distinguish three levels of granularity.</Paragraph> <Paragraph position="21"> a dialogue (resp. a text) can be segmented in at least two different ways into turns (resp.</Paragraph> <Paragraph position="22"> paragraphs), or a turn (rcsp. a paragraph) can be segmented in at least two different ways into utterances, or an utterance can be analyzed in at least two different ways, whereby the analysis is performed in view of translation into one or several _ l%ngugges inthe context o~i a certifin generic task. Ambiguities of segmentation into paragraphs may occur in written texts, if, for example, there is a separation by a <new line> character only, without <line feed> or <paragraph>. They are much more frequent and problematic in dialogues. We found many examples of such ambiguities in ATR's transcriptions of Wizard of Oz interpretations dialogues \[101.</Paragraph> <Paragraph position="23"> Ambiguities of segmentation into utterances are frequent, and most annoying, as analyzers generally work utterance by utterance, even if they can access analysis results of the preceding context. For example: &quot;r ight I? now I ? turn left...&quot; or (\[10\], p. 50): ~OI< I ? so go back and is this number three I ? right there I? shall I wait here for the bus?&quot;.</Paragraph> <Paragraph position="24"> As far as utteranceqevel ambiguities are concerned, let us stress again that we consider only those which should be produced by a state-of-the-art analyzer constrained by the task. For instance, &quot;Please state your phone number&quot; shoukl not be deemed ambiguous, as no complete analysis should allow &quot;state&quot; to be a noun, or &quot;phone&quot; to be a verb. That could be different in a context where &quot;state&quot; could be construed as a proper noun (&quot;State&quot;), for example in a dialogue involving the State Department.</Paragraph> <Paragraph position="25"> There is a fmther point. Consider the utterance: (i) Do you know where the international telephone services are located? &quot;File underlilmd fragment has an ambiguity ot' attachment, because it has two different &quot;skeleton&quot; 12\] representations: \[international telephone\] services / international \[telephone services\] As a title, this sequence presents the same ambiguity. However, it is not enough to consider it in isolation. Take for example: (2) The international telephone services many countries.</Paragraph> <Paragraph position="26"> The ambiguity has disappeared! It is indeed frequent that an ambiguity relative to a fragment appears, disappears and reappears as one broadens its context. For example, in (3) The international telephone services many countries have established are very reliable.</Paragraph> <Paragraph position="27"> the ambiguity has reappeared. Hence, in order to define properly what an ambiguity is, we must consider the fragment within an utterance, and chuify the idea that the fragment is the smallest (within the utterance) where the ambiguity can be observed.</Paragraph> <Paragraph position="28"> Although utterance-level ambiguities must be considered in tile context of whole utterances, a sequence like &quot;international telephone services&quot; is ambiguous in the same way in utterances (l) and (3) above. We call this an &quot;ambiguity kernel&quot;, as opposed to &quot;ambiguity occurrence&quot;, or &quot;ambiguity&quot; for short. it also clear that another sequence, such as &quot;important husiness addresses&quot;, presents the same sort of ambiguity, or &quot;ambiguity type&quot; in analogous contexts (here, &quot;ambiguity of attachment&quot;, or &quot;structural ambiguity&quot;). Other types concern the acceptions (word senses), the functions (syntactic or semantic), etc. &quot;Ambiguity patterns&quot; are more specific kinds of ambiguity types, usable to trigger actions, such as tim production of disambiguating dialogues.</Paragraph> <Paragraph position="29"> We take it for granted that, for each considered representation system, we know how to define, R~r each fragment V of an utterance U having a proper representation P, tile part of P which represents V.</Paragraph> <Paragraph position="30"> For example, given a context-free grammar and an associated tree structure P for U, the part of P representing a substring V of U is the smallest sub-tree Q containing all leaves corresponding to V. Q is not necessarily the whole subtree of P rooted at the root of Q. Conversely, for each part Q of P, we suppose that we know how to define the fragment V of U represented by Q.</Paragraph> <Paragraph position="31"> Let P be a proper representation of U. Q is a minimal underspecifiedpart of P if it does not contain any strictly smaller underspecified part Q'.</Paragraph> <Paragraph position="32"> -Let P be a proper representation of U and Q be a minimal underspecified part of P. The scope of the ambiguity of underspecification exhibited by Q is the fragment V represented by Q.</Paragraph> <Paragraph position="33"> In the case of an anaphoric element, Q will presumably correspond to one word or term V. In the case of an indeterminacy of semantic relation (deep case), e.g. on some argument of a predicate, Q would correspond to a whole phrase V. I A fragment V presents an ambiguity of multiplicity n (n>2) in an utterance U if it has n different proper representations which are part of n or more proper representations of U.</Paragraph> <Paragraph position="34"> V is an ambiguity scope of an ambiguity if it is minimal relative to that ambiguity. This means that any strictly smaller fragment W of U has strictly less than n associated sub-representations or, equivalently, that at least two of the representations of V are be \] equal with respect to W.</Paragraph> <Paragraph position="35"> In example (1) above, then, the fragment &quot;the international telephone services&quot;, together with the two skeleton representations the \[international telephone\] services / the international \[telephone services\] is not minimal, because it and its two representations can be reduced to the subfragment &quot;international telephone services&quot; and its two representations (which are minimal).</Paragraph> <Paragraph position="36"> This leads us to consider that, in syntactic trees, the representation of a fragment is not necessarily a &quot;horizontally complete&quot; subtree. In the case above, for example, we might have the configurations given in the figure below.</Paragraph> <Paragraph position="37"> In the first pair (constituent structures), &quot;international telephone services&quot; is represented by a complete subtree. In the second pair (dependency structures), the representing subtrees are not complete subtrees of the whole tree. I An ambiguity occurrence, or simply ambiguity, A, of multiplicity n (n>2) relative to a representation system R, may be formally defined as: A = (U, V, <P1, P2...Pm>, <Pl, P2...Pn>), where m>n and: U is a complete utterance, called the context of the ambiguity.</Paragraph> <Paragraph position="38"> V is a fragment of U, usually, but not necessarily connex, the scope of the ambiguity.</Paragraph> <Paragraph position="39"> P1, P2...Pm are all proper representations of U in R, and Pl, P2...Pn are the parts of them which represent V.</Paragraph> <Paragraph position="40"> For any fragment W of U strictly contained in V, if ql, q2...qn are the parts of Pl, P2.--Pn corresponding to W, there is at least one pair _ qi, qj (i~j) such that qi = qj.</Paragraph> <Paragraph position="41"> This may be illustrated by the following diagram, A P2 'p3 _ where we take the representations to be tree structures represented by triangles. Here, P2 and P3 have the same part P2 representing V, so that m>n.</Paragraph> <Paragraph position="42"> I The an ambiguity kernel of A = (U, V, <PI, P2...Pm>, <Pl, P2...pn>) is the scope of A and its (proper) representations:</Paragraph> <Paragraph position="44"> In a data base, it suffices to store only the kernels, and references to the kernels from the utterances.</Paragraph> <Paragraph position="45"> The of A is the in which the differ, and type way Pi must be defined relative to each particular R.</Paragraph> <Paragraph position="46"> If the representations are complex, the difference between two representations is defined recursively. For example, two decorated trees may differ in their geometry or not. If not, at least two corresponding nodes must differ in their decorations.</Paragraph> <Paragraph position="47"> Further refinements can be made only with respect to the intended interpretation of the representations. For example, anaphoric references and syntactic functions may be coded by the same kind of attribute-value pairs, but are usually considered as different ambiguity types.</Paragraph> <Paragraph position="48"> When we define ambiguity types, the linguistic intuition should be the main factor to consider, because it is the basis for any disambiguation method.</Paragraph> <Paragraph position="49"> For example, syntactic dependencies may be coded geometrically in one representation system, and with features in another, but disambiguating questions should be the same. Finally, An ambiguity pattern is a schem~i wfth variables which can be instantiated to a (usually unbounded) set of ambiguity kernels.</Paragraph> <Paragraph position="50"> Here is an ambiguity pattern of multiplicity 2 corresponding to the example above (constituent structures ) . NP\[xl NP\[x2 x3\] \] , NP\[NP\[xI x2\] x3\] .</Paragraph> <Paragraph position="51"> We don't elaborate, as ambiguity patterns are specific to particular representation systems and analyzers, so that they should not appear in our labelling.</Paragraph> </Section> <Section position="6" start_page="122" end_page="122" type="metho"> <SectionTitle> 3 Principles of Ambiguity Labelling </SectionTitle> <Paragraph position="0"> For lack of space, we cannot give here the context-free grammar which defines our labelling formally, and illustrate the underlying principles by way of examples from a dialogue transcription taken from \[1 \].</Paragraph> <Paragraph position="1"> The labelling begins by listing the text or the transcription of the dialogue, thereby indicating segmentation problems with the mark &quot; \[ I ? &quot;. Bracketed numbers are optional and correspond to the turns or paragraphs as presented in the original.</Paragraph> <Paragraph position="2"> LABELLED DIALOGUE: &quot;EMMI l Oa&quot; \[1\] A: Good morning conference office I1? how can I help you \[2\] AA: \[ah\] yes good morning could you tell me please how to get from Kyoto station to your conference center \[7\] A: /Is/ OK, you're at Kyoto station right now II7 \[8\] AA: {yes} \[9\] A: {/breath/} and to get to the</Paragraph> <Section position="1" start_page="122" end_page="122" type="sub_section"> <SectionTitle> International Conference Center you can </SectionTitle> <Paragraph position="0"> either travel by taxi bus or subway how would you like to go \[10\] AA: I think subway sounds like the best way to me The labelling continues with the next level of granularity, paragraphs or turns. The difference is that a turn begins with a speaker's code. For each paragraph or turn, we then label the ambiguities of each possible utterance. If there is an ambiguity of segmentation in paragraphs or turns, there may be more labelled paragraphs or turns than in the source. For example, A I1? B I1? C may give rise to A-BIIC and AIIB-C, and not to A-B-C and AIIBIIC. Which combinations are possible should be determined by the person doing the labelling. An interruption such as \[8\] may also create a discontinuous turn (\[7, 9\] here).</Paragraph> <Paragraph position="1"> In the case of utterances, the same remarks apply.</Paragraph> <Paragraph position="2"> However, discontinuities should not appear. There are often less possible utterances than all possible combinations. Take the example given in I1.3 above: OK l? so go back and is this number three I? right there I? shall I wait here for the bus? This is an A I? B I? C I? D pattern, giving rise to 10 possible combinations. If the labeller considers only the 4 possibilities AIBIC-D, AIBICID, AIB-CID, and A-B-CID, the following 7 utterances will be labelled: A OK A-B-C OK so go back and is this number three right there B so go back and is this number three B-C so go back and is this number three right there C right there C-D right there shall I wait here for the bus ? D shall I wait here for the bus? The mark TUm~ (or PARAG for a text) must be used if there is more than one utterance. /TURN is optional and should be inserted to close the list of utterances, that is if the next paragraph contains only one utterance and does not begin with PARAG. A format still closer to the TEl guidelines may be proposed in the future.</Paragraph> </Section> </Section> <Section position="7" start_page="122" end_page="123" type="metho"> <SectionTitle> LABELLED TURNS OF DIALOGUE &quot;EMMI 10a&quot; </SectionTitle> <Paragraph position="0"> get from Kyoto station to your conference center(3)? The idea is to label all ambiguity occurrences, but only the ambiguity kernels not already labelled. The end of the scope of each ambiguity occurrence is indicated in the text by a bracketed number which identifies its ambiguity kernel.</Paragraph> <Paragraph position="1"> Each ambiguity kernel begins with its header. Then come its obligatory labels (scope, then status, importance, and type, in any order), and its other labels. For example, the kernel header &quot;ambiguity ~I10a-2 ' 5.1 &quot; identifies kernel #2' in dialogue EMMI 10a, noted here EMMI10a. &quot;5.1&quot; is the coding of \[11\]. The status (expert_system, interpreter, user) expresses the kind of supplementary knowledge needed to reliably solve the considered ambiguity. If &quot;expert_system&quot; is given, and if a disambiguation strategy decides to solve this ambiguity interactively, it may ask: the expert system, if any; the interpreter, if any; or the user (speaker). If &quot;interpreter&quot; is given, it means that an expert system of the generic task at hand could not be expected to solve the ambiguity.</Paragraph> <Paragraph position="2"> The importance (crucial, important, not-important, negligible) expresses the impact of solving the ambiguity in the context of the intended task. Then comes the ambiguity type (structure, comm_act, class, meaning, target language, reference, address, situation, mode) and its value(s). The linguists may define more types and complete the list of values if necessary.</Paragraph> <Paragraph position="3"> Other labels are optional. Their list will be completed in the future as more ambiguity labelling is performed. As for now, they comprise the disambiguation scope (how far does the solution of the ambiguity kernel carry over in the subsequent utterances), and the multi-modality (what kind of cues could be used to help solve the ambiguity in a multimodal setting).</Paragraph> <Paragraph position="4"> For lack of space, we can present only a few of the interesting examples from the same dialogue.</Paragraph> <Paragraph position="5"> The interpretation of &quot;1 am to&quot; (obligation or future) is solvable reliably only by the speaker.</Paragraph> <Paragraph position="6"> The following example is like the famous one: &quot;Time flies like an arrow&quot;/&quot;Linguist's examples&quot; are often derided, but they really appear in texts and dialogues. However, as soon as they are taken out of context, they look again as artificial as &quot;linguist's examples&quot;/ Although many studies on ambiguities have been published, the specific goal of studying ambiguities in the context of interactive disambiguation in text and speech translation has led us to explore new ground and to propose the concept of &quot;ambiguity labelling&quot;. About 80 pages of dialogues gathered at ATR have been labelled: monolingual dialogues in Japanese and English, and bilingual WOZ dialogues \[ 10\]. Attempts have also been made on French texts and dialogues, and on monolingual telephone dialogues for which analysis results produced by automatic analyzers were available. Part of these collected ambiguities have been used for experiments on interactive disambiguation.</Paragraph> </Section> class="xml-element"></Paper>