File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1075_metho.xml
Size: 13,507 bytes
Last Modified: 2025-10-06 14:14:14
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1075"> <Title>Multi-lingual Translation of Spontaneously Spoken Language in a Limited Domain</Title> <Section position="3" start_page="0" end_page="442" type="metho"> <SectionTitle> 2 System Overview </SectionTitle> <Paragraph position="0"> The JANUS System is a large scale multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneous conversation in a limited domain. A diagram of the architecture of the system is shown in Figure 1. The system is composed of three main components: a speech recognizer, a machine translation (MT) module and a speech synthesis module. The speech recognition component of the system is described elsewhere (Woszczyna et al. 1994). For speech synthesis, we use a commercially available speech synthesizer.</Paragraph> <Paragraph position="1"> The MT module is composed of two separate translation sub-modules which operate independently. The first is the GLR module, designed to be more accurate. The second is the Phoenix module, designed to be more robust. Both modules follow an interlingua-based approach. The source language input string is first analyzed by a parser, which produces a language-independent interlingua content representation. The interlingua is then passed to a generation component, which produces an output string in the target language. null The discourse processor is a component of the GLR translation module. It disaInbiguates the speech act of each sentence, normalizes temporal</Paragraph> <Paragraph position="3"> expressions, and incorporates the sentence into a discourse plan tree. 'the discourse processor also updates a calendar which keeps track of what the speakers haw'~ said about their schedules. The discourse processor is described in greater detail else.where (R,osd et 31. 1995).</Paragraph> </Section> <Section position="4" start_page="442" end_page="443" type="metho"> <SectionTitle> 3 The QLR Translation Module </SectionTitle> <Paragraph position="0"> The (\]LR.* parser (Lavie and Tomita 11993; I,avie 1994) is a parsing system based on Tomita's Generalized LI~ parsing algorithm (Tomita 1987). The parser skips parts of the utterance that it cannot incorporate into a well-formed sentence structure.</Paragraph> <Paragraph position="1"> Thus it is well-suited to doinains ill which nongrammaticality is coalition. The parser conducts a search for the maximal subset of the original input that is covered by the grammar. This is done using a beam search heuristic that limits tile combinations of skipped words considered by the parser, and ensures that it operates within feasible time and space bonnds.</Paragraph> <Paragraph position="2"> The GI,R* parser was implemented as an extension to the G LR parsing system, a unification-I>ased practical natural language system ('lbmita 1990). The grammars we develop for the ,IAN US system are designed to produce \[eature structures that correspond to a frame-based language-independent representation of the meaning of the input utterance. For a given input utterance., the parser produces a set; of interlingua texts, or ILTs.</Paragraph> <Paragraph position="3"> The main components of an ILT are the speech act (e.g., suggest, accept, reject), the sentence type (e.g., state, query-i~, fragment), and the main semantic frame (e.g., free, busy). An example of nn ILl' is shown in Figure 2. A detailed IUI' Specitication was designed as a formal de~ scription of the allowable ILTs. All parser output must conform to this ILl' Speeitication. The GLR unification based formalism allows the grammars to construct precise and very detailed ILTs. This in turn allows the G LI{ translation module to produce highly accurate translations for well-formed input.</Paragraph> <Paragraph position="4"> The G LR* parser also includes several tools designed to address the difficulties of parsing spontaneous speech. To cope with high levels of ambiguity, the parser includes a statis|,ical disambiguation module, in which probabilities are attached directly to the actions in the LR parsing table. The parser can identify sentence boundaries within each hypothesis with the help of a statistical method that determines the probability of a boundary at; each point in the utterance. The parser must also determine the &quot;best&quot; parse from among tit(; diflZrent parsable subsets of an input. This is don(; using a collection of parse evaluation measures which are combined into an integrated heuristic for evaluating and ranking the parses produced by the parser. Additionally, a parse quality heuristic allows the parser to self- null judge the quality of tile parse chosen as best, and to detect cases in which important information is likely to have been skipt)ed.</Paragraph> <Paragraph position="5"> Target language generation in the (;LR modtde is clone using GenKit (Tomita and Nyberg 1988), a unification-based generation system. With well-developed generation grammars, GenKit results in very accurate translation for well-specified IUI%.</Paragraph> </Section> <Section position="5" start_page="443" end_page="443" type="metho"> <SectionTitle> 4 The Phoenix Translation Module </SectionTitle> <Paragraph position="0"> The ,IANUS Phoenix translation module (Mayfield et el. 1995) is an extension of the Phoenix Spoken Language System (Ward 1991; Ward 1994). The translation component consists of a t)arsing module and a generation module. Translation between any of the four source languages (English, German, SpanisIL Korean) and five target languages (English, German, Spanish, Korean, Japanese) is possible, although we currently focus only on a few of these language pairs.</Paragraph> <Paragraph position="1"> Unlike the GI, R method which attempts to construct a detailed tur for a given input utterance, the Phoenix approach attempts to only identify the key semantic concepts represented in the utterance and their underlying structure. Whereas GLR* is general enough to support both semantic and syntactic grammars (or some combination of both types), the Phoenix approach was specifically designed for semantic grammars. Grammatical constraints are introduced at the phrase level (as opposed to the sentence level) and regulate semantic categories. This allows the ungrammaticalities that often occur between phrases to be ignored and reflects tile fact that syntactically incorrect spontaneous speech is often semantically well-formed.</Paragraph> <Paragraph position="2"> The parsing grammar specifies patterns which represent concepts in the domain. The patterns are composed of words of the input string as well as other tokens for constituent concepts. Elements (words or tokens) in a pattern may be specified as ol)tional or repeating (as in a Kleene star mechanisln). Each concept, irrespective of its level in the hierarchy, is represented by a separate grammar file. These grammars are compiled into Recursive Transition Networks (RTNs).</Paragraph> <Paragraph position="3"> The interlingua meaning representation of an input utterance is derived directly from the 1)arse tree constructed by the parse.r, by extracting the represented structure of concepts. This representation is usually less detailed than tile corresponding GLR IlfF representation, and thus often resuits in a somewhat less accurate translation. The set of semantic concept tokens for the Scheduling domain was initially developed from a set of 45 example English dialogues. Top-level tokens, also called slots, represent speech acts, such as suggestion or agreement. Intermediate-level tokens distingnish between points and intervals in time, for example; lower-level tokens cat)ture the speciiics of the utterance, such as days of the week, and represent the only words that are translated directly via lookup tables.</Paragraph> <Paragraph position="4"> 'File parser matches as much of the inl)ut utterance as it can to the patterns specified by the I~TNs. Out-of-lexicon words are ignored, unless they occur in specific locations where open concepts are permitted. A word that is already known to the system, however, can cause a concept pattern not to match if it occurs in a position unspecified in the grammar. A failed concept does not cause the entire parse to fail. The parser can ignore any number of words in between top-level concepts, handling out-of-domain or otherwise unexpected input. Tile parser has no restrictions on the order in which slots ca~ occur. This can cause added ambiguity in the segmentation of the utterance into concepts. The parser uses a disambiguation algorithm that attempts to cover the largest number of words using the smallest number of concepts.</Paragraph> <Paragraph position="5"> Figure 3 shows an example of a speaker utterance and the parse that was produced using the Phoenix parser. The parsed speech recognizer outpnt is shown with unknown (-) and unexpected (*) words marked. These segments of the input were ignored by the parser. The relevant concepts, however, are extracted, and strung together they provide a general meaning representation of what the speaker actually said.</Paragraph> <Paragraph position="6"> Generation in the Phoenix module is accomplished using a sirnple strategy that sequentially generates target language text for each of the top level concepts in the parse analysis. Each concept has one or more tixed phrasings in the target language. Variables such as times and dates are extracted from the parse analysis and translated directly. The result is a meaningfifl translation, but can have a telegraphic feel.</Paragraph> </Section> <Section position="6" start_page="443" end_page="444" type="metho"> <SectionTitle> 5 Combining the GLR and </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="443" end_page="444" type="sub_section"> <SectionTitle> Phoenix Translation Modules 5.1 Strengths and Weaknesses of the Approaches </SectionTitle> <Paragraph position="0"> As already described, both the GLR* parser and tile Phoenix parser were specifically designed to handle tile problems associated with analyzing spontaneous speech, llowever, each of the ap-</Paragraph> <Paragraph position="2"> proaches has some clear strengths and weaknesses.</Paragraph> <Paragraph position="3"> Although designed t<> COl)e with speech disth|encies, (;LR* can graeehdly tolerate only moderate levels of deviation from the grammar. When the input is only slightly ungrammatical, and contains relatively minor distluencies, (ILR* produces precise and detailed IH's that result in high quality translations. The (ILl{* parser has <lifliculties in parsing long utterances that are highly dislluent, or that significantly deviate from the grammar.</Paragraph> <Paragraph position="4"> In many such cases, (I LH,* succeeds to parse only a small fragment of the entire utterance, and important input segments end up being sldl)t)ed. 1 l)hoenix is signitlcantly better suited to analyzing such utterances. Because Phoenix is capable of skipping over input segments that <1o not correspond to any top level semantic concept, it can far better recover from out-of-domain se.gments in the input, and &quot;restart&quot; itself on an in-domain segment that follows. However, this sometime.s resuits in the parser picking up and mis-translating a small parsal)le phrase within an out-of-domain IRccent work on a method for pre-brcaking the utterance at sentence boundaries prior to parsing have signiii(:antly reduced this l)rol)lem.</Paragraph> <Paragraph position="5"> segtnent. To handle this problem, we are. attempting to develop methods for automatically detecting out-of-domain segments in an utterance (see section 7).</Paragraph> <Paragraph position="6"> Because the Phoenix approach ignores small fmlction words in the mt)ut , its translation results are by design bound to be less accurate. However, the ability to ignore function words is of great benellt when working with speech recognition output, in which such words are often mistaken. By keying on high-conlidence words l>hoenix takes advantage of the strengths of the speech decoder. At the current time, Phoenix uses only very simple disambiguation heuristics, does not employ any discourse knowledge, and does not have a mechanism similar to the parse quality heuristic of GLR*, which allows the parser to self-assess the quality of the produced result.</Paragraph> </Section> <Section position="2" start_page="444" end_page="444" type="sub_section"> <SectionTitle> 5.2 Combining the Two Approaches </SectionTitle> <Paragraph position="0"> I{ecause each of the two translation methods appears to perform better on different types of utterances, they may hopefldly be combined in a way that takes adwmtage of the strengths of each of them. One strategy that we have investigated is to use the l'hoeIfiX module as a back-up to the (1 Lt{ module. The parse result of GLR* is translated whenever it is judged by the parse quality heuristic to be &quot;Good&quot;. Whenever the parse result t~'om GLI{* is judged as &quot;Bad&quot;, the translation is generated from the corresponding output of the Phoenix parser. Results of using this combination scheme are presented in the next section. We art: in the process of investigating some more sophisticated methods for combining the two translation at)proaehes.</Paragraph> </Section> </Section> class="xml-element"></Paper>