File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0508_metho.xml

Size: 14,379 bytes

Last Modified: 2025-10-06 14:07:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0508">
  <Title>Stochastic Finite-State models for Spoken Language Machine ': anslation</Title>
  <Section position="4" start_page="52" end_page="53" type="metho">
    <SectionTitle>
3 Acquiring Lexical Translations
</SectionTitle>
    <Paragraph position="0"> In the problem of speech recognition the alignment between the words and their acoustics is relatively straightforward since the words appear in the same order as their corresponding acoustic events. In contrast, in machine&amp;quot; translation, the linear order of words in the source language, in general is not maintained in the target language.</Paragraph>
    <Paragraph position="1"> The first stage in the process of bilingual phrase acquisition is obtaining an alignment function that given a pair of source and target language sentences, maps source language word subsequences into target language word subsequences. For this purpose, we use the alignment algorithm described in (Alshawi et 2 Note that computing the exact set of all possible reorderings is computationally expensive. In Section 5 we discuss an approximation for the set of all possible reorderings that serves for our application.</Paragraph>
    <Paragraph position="2">  al., 1998a). The result of the alignment procedure is shown in Table 1.3 Although the search for bilingual phrases of length more than two words can be incorporated in a straight-forward manner in the alignment module, we find that doing so is computationally prohibitive. We first transform the output of the alignment into a representation conducive for further manipulation. We call this a bilanguage TB. A string</Paragraph>
    <Paragraph position="4"> an example alignment and the source-word-ordered bilanguage strings corresponding to the alignment shown in Table 1.</Paragraph>
    <Paragraph position="5"> Having transformed the alignment for each sentence pair into a bilanguage string (source word-ordered or target word-ordered), we proceed to segment the corpus into bilingual phrases which can be acquired from the corpus TB by minimizing the joint entropy H(Ls, LT) ~ -1/M log P(TB). The probability P(Ws, WT) = P(R) is computed in the same way as n-gram model: where wl E LsUe, zi E LTUe, e is the empty string and wi_zi is the symbol pair (colons are the delimiters) drawn from the source and target language. null A string in a bilanguage corpus consists of sequences of tokens where each token (wi-xi) is represented with two components: a source word (\]possibly an empty word) as the first component and the target word (possibly an empty word) that is the translation of the source word as the second component. Note that the tokens of a bilanguage could be either ordered according to the word order of the source language or ordered according to the word  order of the target language. Thus an alignment of a pair of source and target language sentences will result in two bilanguage strings. Table 2 shows 3The Japanese string was translated and segmented so that a token boundary in Japanese corresponds to some token boundary in English.</Paragraph>
    <Paragraph position="7"> Using the phrase segmented corpus, we construct a phrase-based variable n-gram translation model as discussed in the following section.</Paragraph>
  </Section>
  <Section position="5" start_page="53" end_page="54" type="metho">
    <SectionTitle>
4 Learning Phrase-based Variable
N-gram Translation Models
</SectionTitle>
    <Paragraph position="0"> Our approach to stochastic language modeling is based on the Variable Ngram Stochastic Automaton (VNSA) representation and learning algorithms introduced in (Riccardi et al., 1995; Pdccardi et al., 1996). A VNSA is a non-deterministic Stochastic Finite-State Machine (SFSM) that allows for parsing any possible sequence of words drawn from a given vocabulary 12. In its simplest implementation the state q in the VNSA encapsulates the lexical (word sequence) history of a word sequence. Each  (%EPS% represents the null symbol c).</Paragraph>
    <Paragraph position="1"> state recognizes a symbol wi E lZU {e}, where e is the empty string. The probability of going from state qi to qj (and recognizing the symbol associated to qj) is given by the state transition probability, P(qj \[qi). Stochastic finite-state machines represent m a compact way the probability distribution over all possible word sequences. The probability of a word sequence W can be associated to a state sequence ~Jw = ql,..., qj and to the probability P(~Jw)&amp;quot; For a non-deterministic finite-state machine the probability of W is then given by P(W) = ~j P((Jw).</Paragraph>
    <Paragraph position="2"> Moreover, by appropriately defining the state space to incorporate lexical and extra-lexical information, the VNSA formalism can generate a wide class of probability distribution (i.e., standard word n-gram, class-based, phrase-based, etc.) (Riccardi et al., 1996; Riccardi et al., 1997; Riccardi and Bangalore, 1998). In Fig. 2, we plot a fragment of a VNSA trained with word classes and phrases. State 0 is the initial state and final states are double circled. The e transition from state 0 to state 1 carries the membership probability P(C), where the class C contains the two elements {collect, calling card}. The c transition from state 4 to state 6 is a back-off transition to a lower order n-gram probability. State 2 carries the information about the phrase calling card. The state transition function, the transition probabilities and state space are learned via the self-organizing algorithms presented in (Riccardi et al., 1996).</Paragraph>
    <Section position="1" start_page="54" end_page="54" type="sub_section">
      <SectionTitle>
4.1 Extending VNSAs to Stochastic
Transducers
</SectionTitle>
      <Paragraph position="0"> Given the monolingual corpus T, the VNSA learning algorithm provides an automaton that recognizes an input string W (W E yY) and computes P(W) C/ 0 for each W. Learning VNSAs from the bilingual corpus TB leads to the notion of stochastic transducers rST. Stochastic transducers rST : Ls x LT ~ \[0, 1\] map the string Ws E Ls into WT E LT and assign a probability to the transduction Ws ~--~ WT. In our case, the VNSA's model will estimate P(Ws ~-~.~&amp;quot; WT) : P(Ws, WT) and the symbol pair wi : xi will be associated to each transducer state q with input label wi and output label xl. The model rST provides a sentence-level transduction from Ws into WT. The integrated sentence and phrase-level transduction is then trained directly on the phrasesegmented corpus 7~ described in section 3.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="54" end_page="54" type="metho">
    <SectionTitle>
5 Reordering the output
</SectionTitle>
    <Paragraph position="0"> The stochastic transducers TST takes as input a sentence Ws and outputs a set of candidate strings in the target language with source language word order. Recall that the one-to-many mapping comes from the non-determinism of VST. The maximization step in equation 2 is carried out with Viterbi algorithm over the hypothesized strings in LT and I~VT is selected. The last step to complete the translation process is to apply the monolingual target language model A T to re-order the sentence I?VT to produce ^ W~. The re-order operation is crucial especially in the case the bilanguage phrases in 7~ are not sorted in the target language. For the re-ordering operation, the exact approach would be to search through all possible permutations of the words in ITVT and select the most likely. However, that operation is computationally very expensive. To overcome this problem, we approximate the set of the permutations with the word lattice AWT representing (xl I x2 I ... XN) N, where xi are the words in ITVT. The most likely string ~V~ in the word lattice is then decoded as follows:</Paragraph>
    <Paragraph position="2"> Where o is the composition operation defined for weighted finite-state machines (Pereira and Riley, 1997). The complete operation cascade for the machine translation process is shown in Figure 3.</Paragraph>
  </Section>
  <Section position="7" start_page="54" end_page="56" type="metho">
    <SectionTitle>
6 Embedding Translation in an
Application
</SectionTitle>
    <Paragraph position="0"> In this section, we describe an application in which we have embedded our translation model and present some of the motivations for doing so. The application that we are interested in is a call type classification task called How May I Help You (Gorin et al., 1997). The goal is to sufficiently understand  caller's responses to the open-ended prompt How May I Help You? and route such a call based on the meaning of the response. Thus we aim at extracting a relatively small number of semantic actions from the utterances of a very large set of users who are not trained to the system's capabilities and limitations. The first utterance of each transaction has been transcribed and marked with a call-type by labelers. There are 14 call-types plus a class other for the complement class. In particular, we focused our study on the classification of the caller's first utterance in these dialogs. The spoken sentences vary widely in duration, with a distribution distinctively skewed around a mean value of 5.3 seconds corresponding to 19 words per utterance. Some examples of the first utterances are given below:  We trained a classifer on the training Set of English sentences each of which was annotated with a call type. The classifier searches for phrases that are strongly associated with one of the call types (Gorin et al., 1997) and in the test phase the classifier extracts these phrases from the output of the speech recognizer and classifies the user utterance. '\]?his is how the system works when the user speaks English.</Paragraph>
    <Paragraph position="1"> However, if the user does not speak the language that the classifier is trained on, English, in our case, the system is unusable. We propose to solve this problem by translating the user's utterance, Japanese, in our case, to English. This extends the usability of the system to new user groups.</Paragraph>
    <Paragraph position="2"> An alternate approach could be to retrain the classifier on Japanese text. However, this approach would result in replicating the system for each possible input language, a very expensive proposition considering, in general, that the system could have sophisticated natural language understanding and dialog components which would have to be replicated also.</Paragraph>
    <Section position="1" start_page="55" end_page="56" type="sub_section">
      <SectionTitle>
6.1 Experiments and Evaluation
</SectionTitle>
      <Paragraph position="0"> In this section, we discuss issues concerning evaluation of the translation system. The data for the experiments reported in this section were obtained from the customer side of operator-customer conversations, with the customer-caxe application described above and detailed in (Riccardi and Gorin, January 2000; Gorin et al., 1997). Each of the customer's utterance transcriptions were then manually translated into Japanese. A total of 15,457 English-Japanese sentence pairs was split into 12,204 training sentence pairs and 3,253 test sentence pairs.</Paragraph>
      <Paragraph position="1"> The objective of this experiment is to measure the performance of a translation system in the context of an application. In an automated call router there axe two important performance measures. The first is the probability of false rejection, where a call is falsely rejected. Since such calls would be transferred to a human agent, this corresponds to a missed opportunity for automation. The second  measure is the probability of correct classification.</Paragraph>
      <Paragraph position="2"> Errors in this dimension lead to misinterpretations that must be resolved by a dialog manager (Abella and Gorin, 1997).</Paragraph>
      <Paragraph position="3"> Using our approach described in the previous sections, we have trained a unigram, bigram and trigram VNSA based translation models with and without phrases. Table 3 shows lexical choice (bagof-tokens) accuracy for these different translation models measured in terms of recall, precision and F-measure.</Paragraph>
      <Paragraph position="4"> In order to measure the effectiveness of our translation models for this task we classify Japanese utterances based on their English translations. Figure 4 plots the false rejection rate against the correct classification rate of the classifier on the English generated by three different Japanese to English translation models for the set of Japanese test sentences. The figure also shows the performance of the classifier using the correct English text as input.</Paragraph>
      <Paragraph position="5"> There are a few interesting observations to be made from the Figure 4. Firstly, the task performance on the text data is asymptotically similar to the task performance on the translation output. In other words, the system performance is not significantly affected by the translation process; a Japanese transcription would most often be associated with the same call type after translation as if the original were English. This result is particularly interesting inspite of the impoverished reordering phase of the target language words. We believe that this result is due to the nature of the application where the classifier is mostly relying on the existence of certain key words and phrases, not necessarily in any particular order.</Paragraph>
      <Paragraph position="6"> The task performance improved from the unigram-based translation model to phrase unigram-based translation model corresponding to the improvement in the lexical choice accuracy in Table 3.</Paragraph>
      <Paragraph position="7"> Also, at higher false rejection rates, the task performance is better for trigram-based translation model than the phrase trigram-based translation model since the precision of lexical choice is better than that of the phrase trigram-based model as shown in Table 3. This difference narrows at lower false rejection rate.</Paragraph>
      <Paragraph position="8"> We are currently working on evaluating the translation system in an application independent method and developing improved models of reordering needed for better translation system.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML