File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0408_metho.xml

Size: 15,856 bytes

Last Modified: 2025-10-06 14:14:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0408">
  <Title>English-to-Mandarin Speech Translation with Head Transducers</Title>
  <Section position="4" start_page="0" end_page="54" type="metho">
    <SectionTitle>
2 Bilingual Head Transduction
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="54" type="sub_section">
      <SectionTitle>
2.1 Bilingual Head Transducers
</SectionTitle>
      <Paragraph position="0"> A head transducer M is a finite state machine associated with a pair of words, a source word w and a target word v. In fact, w is taken from the set V1 consisting of the source language vocabulary augmented by the &amp;quot;empty word&amp;quot; e, and v is taken from V~, the target language vocabulary augmented with e. A head transducer reads from a pair of source sequences, a left source sequence L1 and a right source sequence Rt; it writes to a pair of target sequences, a left target sequence L2 and a right target sequence R2 (Figure 1).</Paragraph>
      <Paragraph position="1"> Head transducers were introduced in Alshawi 1996b, where the symbols in the source and target sequences are source and target words respectively. In the model described in this paper, the symbols written are dependency relation symbols, or the empty symbol e. The use of relation symbols here is a result of the historical development of the system from an earlier transfer model. A conceptually simpler translator can be built using head transducer models with only lexical items, in which case the distinction between different dependents is implicit in the state of a transducer.</Paragraph>
      <Paragraph position="2"> In head transducer models, the use of relations corresponds to a type of class-based model (cf Je- null We can think of the transducer as simultaneously deriving the source and target sequences through a series of transitions followed by a stop action. From a state qi these actions are as follows: null * Selection of a pair of dependent words w' and v ~ and transducer M ~ given head words w and v and source and target dependency relations rl and r s. (w,w' E V1; v,v t E Vs.) The recursion takes place by running a head transducer (M ~ in the second action above) to derive local dependency trees for corresponding pairs of dependent words (w',vl). In practice, we restrict the selection of such pairs to those provided by a bilingual lexicon for the two languages. This process of recursive transduction of local trees is shown graphically in Figure 2 in which the pair of words starting the entire derivation is (w4, v4).</Paragraph>
    </Section>
    <Section position="2" start_page="54" end_page="54" type="sub_section">
      <SectionTitle>
2.3 Translator
</SectionTitle>
      <Paragraph position="0"> A translator based on head transducers consists of the following components: * A bilingual lexicon in which entries are 5tuples (w,v, M,q,c), associating a pair of source-target words with a head transducer M, an initial state q, and a cost c.</Paragraph>
      <Paragraph position="1"> Left transition: write a symbol rl onto the right end of L1, write symbol r2 to position a in the target sequences, and enter state qi+l. A parameter table giving the costs of actions for head transducers and the recursive transduction process.</Paragraph>
      <Paragraph position="2"> * Right transition: write a symbol rl onto the left end of R1, write a symbol r~ to position a in the target sequences, and enter state qi+l. * Stop: stop in state qi, at which point the sequences L1, R1, L~ and R~ are considered complete.</Paragraph>
      <Paragraph position="3"> In simple head transducers, the target positions can be restricted in a similar way to the source positions, i.e., the right end of L2 or the left end of R2. The version we used for English-to-Chinese translation allows additional target positions, as explained in: Section 3.</Paragraph>
    </Section>
    <Section position="3" start_page="54" end_page="54" type="sub_section">
      <SectionTitle>
2.2 Recursive Head Transduction
</SectionTitle>
      <Paragraph position="0"> We can apply a set of head transducers recursively to derive a pair of source-target ordered dependency trees. This is a recursive process in which the dependency relations for corresponding nodes in the two trees are derived by a head transducer.</Paragraph>
      <Paragraph position="1"> In addition to the actions performed by the head transducers, this derivation process involves the actions: * Selection of a pair of words w0 E V1 and v0 E V~, and a head transducer M0 to start the entire derivation.</Paragraph>
      <Paragraph position="2"> * A transduction search engine for finding the minimum cost target string 'for an input source string (or recognizer speech lattice).</Paragraph>
      <Paragraph position="3"> The search algorithm used in our implementation is a head-outwards dynamic programming algorithm similar to the parsing algorithm for monolingual head acceptors described in Alshawi 1996a. Head-outwards processing techniques were developed origninally for lexically-driven parsing (Sata and Stock 1989, Kay 1989).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="54" end_page="58" type="metho">
    <SectionTitle>
3 English-Chinese Head
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="54" end_page="56" type="sub_section">
      <SectionTitle>
Transducers
3.1 Source and Target Positions
</SectionTitle>
      <Paragraph position="0"> In deciding the set of allowable positions for source and target transitions, there are tradeoffs involving model size, flexibility for modeling word-order changes in translation, and computational efficiency of the search for lowest cost transductions.</Paragraph>
      <Paragraph position="1"> These tradeoffs led us to constrain the source positions of transitions to just two, specifically the simple left and right source positions mentioned in the description of transitions in Section 2.1. This restriction means that the transduction search can be carried out with the type of algorithm used for</Paragraph>
      <Paragraph position="3"/>
      <Paragraph position="5"> head-outwards context free parsing. In particular, we use a dynamic programming tabular algorithm to find the minimal cost transduction of a word string or word-lattice from a speech recognizer.</Paragraph>
      <Paragraph position="6"> The algorithm maintains optimal &amp;quot;active-edges&amp;quot; spanning a segment of the input string (or two states in the recognition word-lattice). This use of context free algorithms is not possible if the number of possible source positions for transduct ions is increased so that incomplete transducer source sequences are no longer simple segments.</Paragraph>
      <Paragraph position="7"> However, the number of target positions for transductions is not constrained by these efficiency considerations. For English-to-Chinese translation, we can decrease the complexity of the transducers (i.e. reduce the number of states and transitions they have) by allowing multiple target positions to the left and right of the head. The motivation for this is that the required reordering of dependents can be achieved with fewer transducer states by accumulating the dependents into subsequences to the left and right of the head. The actual left and right target sequences are formed by concatenating these subsequences. We can use the following notation to number these additional positions. The head is notionally at position 0, and the &amp;quot;standard&amp;quot; positions immediately to the left and right of the head are numbered as -1 and +1 respectively. The position that extends the kth subsequence to the left of the head outwards from the head is numbered -2k + 1, while the position that extends this same subsequence inwards towards the head is labeled -2k. The positions to the right of the head are numbered analogously with positive integers.</Paragraph>
    </Section>
    <Section position="2" start_page="56" end_page="56" type="sub_section">
      <SectionTitle>
3.2 Examples of Dependency Relation
Head Transducers
</SectionTitle>
      <Paragraph position="0"> An example of the structure of a simplified head transducer for converting the dependents of a typical English transitive verb into those for a corresponding Chinese verb is shown in Figure 3. The nodes in the figure correspond to states; a bilingual lexical entry would specify q0 as the initial state in this case. Transitions are shown as arcs between states; the label on an arc specifies the relation symbol, source position, and target position, respectively. Stop actions are not shown, though states allowing stop actions are shown as double circles, the usual convention for final states. A typical path through the state diagram is shown in bold: this converts the English dependency sequence for statement sentences with the pattern actor head object temporal into the corresponding Chinese sequence actor temporal head object.</Paragraph>
      <Paragraph position="1"> Similarly, an English dependency sequence for yes-no questions modal actor head object temporal is converted into the Chinese sequence actor temporal modal head object MA, the transducer stopping in state q6, MA being the relation between the head verb and the Chinese particle for yes-no questions. The final states for this transducer network are kept distinct so that different costs can be assigned by training to the stop actions and modifier transitions at these states.</Paragraph>
      <Paragraph position="2"> Another example is the English-to-Chinese head transducer for noun phrase dependency relations shown in Figure 4. Typical target positions for transitions corresponding to noun phrase modification (noun phrases are head-final in Chinese) are as follows:  The position for transitions emitting the Chinese particle pronounced DE may be either -2, -4, or -6, depending on the transducer states for the transition. The different states effectively code the presence of different modifier types. It should also be noted that the above positions do not completely define the order of modifiers in the transduction. For example, the relative order of target specifiers, cardinals, and ordinals will depend on the order of these modifiers in the source.</Paragraph>
    </Section>
    <Section position="3" start_page="56" end_page="58" type="sub_section">
      <SectionTitle>
3.3 Model Construction
</SectionTitle>
      <Paragraph position="0"> The head transducer model was trained and evaluated on English-to-Mandarin Chinese translation of transcribed utterances from the ATIS corpus (Hirschman et al. 1993). By training here we</Paragraph>
      <Paragraph position="2"> simply mean assignment of the cost functions for fixed model structures. These model structures were coded by hand as a head transducer lexicon.</Paragraph>
      <Paragraph position="3"> The head transducers were built by modifying the English head acceptors defined for an earlier transfer-based system (Alshawi 1996a). This involved the addition of target relations, including some epsilon relations, to automaton transitions.</Paragraph>
      <Paragraph position="4"> In some cases, the automata needed to be modified to include additional states, and also some transitions with epsilon relations on the English (source) side. Typically, such cases arise when an additional particle needs to be generated on the target side, for example the yes-no question particle in Chinese. The inclusion of such particles often depended on additional distinctions not present in the original English automata, hence the requirement for additional states in the bilingual transducer versions.</Paragraph>
      <Paragraph position="5"> In fact, many of the automata in these entries had the same structure, and are independent of the ATIS domain. Domain dependence and the differences in word behavior (for example the differences in behavior between two verbs with the same subcategorization) were due to the costs applied when running the automata. The method used to assign the cost parameters for the model can be characterized as &amp;quot;supervised discriminative training&amp;quot;. In this method, costs are computed by tracing the events involved in producing translations of sentences from a source training corpus; a bilingual speaker classifies the output translations as positive or negative examples of acceptable translations. Details of this cost assignment method are presented in Alshawi and Buchsbaum 1997.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="58" end_page="59" type="metho">
    <SectionTitle>
4 Head Transducers in Speech
Translation
</SectionTitle>
    <Paragraph position="0"> Speech translation has special requirements for efficiency and robustness. We believe that head transduction models have certain advantages that help satisfy these requirements.</Paragraph>
    <Section position="1" start_page="58" end_page="59" type="sub_section">
      <SectionTitle>
Ranking Head
</SectionTitle>
      <Paragraph position="0"> transduction models are weighted, so the costs for translation derivations can be combined with those from acoustic processing. Weighted models can also contribute to efficiency because dynamic programming can be used to eliminate suboptimal derivations. This is particularly important when the input is in the form of word lattices. Since the contributions of both the source, target, and bilingual components of the models are applied simultaneously when computing the costs of partial derivations, there is no need to pass multiple alternatives forwards from source analysis to transfer to generation; the translation ranked globally optimal is computed with a single admissible search.</Paragraph>
      <Paragraph position="1"> Efficiency In addition to the points made in the preceding paragraph on ranking, we noted earlier that transduction with appropriately restricted source positions for transitions can be carried out with search techniques similar to context free parsing (e.g. Younger 1967). Head outward processing with a lexicalized model also the obvious advantage to efficiency that only the part of the model related to the source words in the input needs to be active during the search process. In an experiment comparing the efficiency of head transduction to our earlier transfer approach, the average time for translating transcribed utterances from the ATIS corpus was 1.09 seconds for transfer and 0.17 for head transduction. This speed improvement was possible while also improving memory usage and translation accuracy. Details of the experiment are presented in Alshawi, Buchsbaum, and Xia, 1997. The efficiency of head transduction has allowed us to start experimenting with (pruned)&amp;quot; word lattices from speech recognition with the aim of producing translations from such word lattices in real time.</Paragraph>
      <Paragraph position="2"> Robustness Bottom-up lexicalized translation is inherently more robust than top-down processing since it allows maximal incomplete partial derivations to be identified when complete derivations are not possible. This is particularly important in the case of speech translation because the input string or word lattice often represents flagmentary, illformed, or &amp;quot;after thought&amp;quot; phrases. When complete derivations are not possible, our experimental system searches for a span of the input string or lattice with the fewest fragments (or the lowest cost such span if there are several}. Lowest-cost translations of such fragments will already have been produced by the transduction algorithm, so an approximate translation of the utterance can be formed by concatenating the fragments in temporal order. In the limit, this approach degrades gracefully into word-for-word translation with the most likely translation of each input word being selected.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML