File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2112_metho.xml

Size: 15,189 bytes

Last Modified: 2025-10-06 14:10:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2112">
  <Title>Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs</Title>
  <Section position="5" start_page="874" end_page="877" type="metho">
    <SectionTitle>
3 Statistical Word Alignment
</SectionTitle>
    <Paragraph position="0"> According to the IBM models (Brown et al., 1993), the statistical word alignment model can be generally represented as in equation (1).</Paragraph>
    <Paragraph position="1">  Where, and represent the source sentence and the target sentence, respectively</Paragraph>
    <Paragraph position="3"> (2) ml, are the lengths of the source sentence and the target sentence respectively. j is the position index of the target word. j a is the position of the source word aligned to the j th target word.</Paragraph>
    <Paragraph position="5"> jd[?] is the distortion probability for the head word of the cept.</Paragraph>
    <Paragraph position="7"> is the distortion probability for the non-head words of the cept.</Paragraph>
    <Paragraph position="8">  This paper uses c and f to represent a Chinese sentence and a Japanese sentence, respectively. And e represents an English sentence.</Paragraph>
    <Paragraph position="10"> [?] is the center of cept i.</Paragraph>
    <Paragraph position="11"> During the training process, IBM model 3 is first trained, and then the parameters in model 3 are employed to train model 4. For convenience, we describe model 3 in equation (3). The main difference between model 3 and model 4 lies in the calculation of distortion probability.</Paragraph>
    <Paragraph position="13"> Corpora of Other Language Pairs As shown in section 3, the word alignment model mainly has three kinds of parameters that must be specified, including the translation probability, the fertility probability, and the distortion probability. The parameters are usually estimated by using bilingual sentence pairs in the desired languages, namely Chinese and Japanese here. In this section, we describe how to estimate the parameters without using the Chinese-Japanese bilingual corpus. We introduce English as the pivot language, and use the Chinese-English and English-Japanese bilingual corpora to estimate the parameters of the Chinese-Japanese word alignment model. With these two corpora, we first build Chinese-English and English-Japanese word alignment models as described in section 3.</Paragraph>
    <Paragraph position="14"> Then, based on these two models, we estimate the parameters of Chinese-Japanese word alignment model. The estimated model is named induced model.</Paragraph>
    <Paragraph position="15"> The following subsections describe the method to estimate the parameters of Chinese-Japanese alignment model. For reversed Japanese-Chinese word alignment, the parameters can be estimated with the same method.</Paragraph>
    <Section position="1" start_page="875" end_page="876" type="sub_section">
      <SectionTitle>
4.1 Translation Probability
Basic Translation Probability
</SectionTitle>
      <Paragraph position="0"> We use the translation probabilities trained with Chinese-English and English-Japanese corpora to estimate the Chinese-Japanese probability as shown in equation (4). In (4), we assume that the translation probability is independent of the Chinese word .</Paragraph>
      <Paragraph position="1">  Where is the translation probability for Chinese-Japanese word alignment.</Paragraph>
      <Paragraph position="2"> is the translation probability trained using the English-Japanese corpus. is the translation probability trained using the Chinese-English corpus.</Paragraph>
      <Paragraph position="3">  In any language, there are ambiguous words with more than one sense. Thus, some noise may be introduced by the ambiguous English word when we estimate the Chinese-Japanese translation probability using English as the pivot language. For example, the English word &amp;quot;bank&amp;quot; has at least two senses, namely: bank1 - a financial organization bank2 - the border of a river Let us consider the Chinese word: He An - bank2 (the border of a river) And the Japanese word: Yin Xing - bank1 (a financial organization) In the Chinese-English corpus, there is high probability that the Chinese word &amp;quot;He An (bank2)&amp;quot; would be translated into the English word &amp;quot;bank&amp;quot;. And in the English-Japanese corpus, there is also high probability that the English word &amp;quot;bank&amp;quot; would be translated into the Japanese word &amp;quot;Yin Xing ( bank1)&amp;quot;.</Paragraph>
      <Paragraph position="4"> As a result, when we estimate the translation probability using equation (4), the translation probability of &amp;quot; Yin Xing (bank1)&amp;quot; given &amp;quot; He An (bank2)&amp;quot; is high. Such a result is not what we expect.</Paragraph>
      <Paragraph position="5"> In order to alleviate this problem, we introduce cross-language word similarity to improve translation probability estimation in equation (4). The cross-language word similarity describes how likely a Chinese word is to be translated into a Japanese word with an English word as the pivot. We make use of both the Chinese-English corpus and the English-Japanese corpus to calculate the cross language word similarity between a Chinese word c and a Japanese word f given an  Input: An English word e , a Chinese word , and a Japanese word ; c f The Chinese-English corpus; The English-Japanese corpus.  (1) Construct Set 1: identify those Chinese-English sentence pairs that include the given Chinese word and English word , and put the English sentences in the pairs into Set 1. c e (2) Construct Set 2: identify those English-Japanese sentence pairs that include the given English word and Japanese word , and put the English sentences in the pairs into Set 2. e f (3) Construct the feature vectors and of the given English word using all other words as context in Set 1 and Set 2, respectively.</Paragraph>
      <Paragraph position="7"> Where is the frequency of the context word .</Paragraph>
      <Paragraph position="9"> (4) Given the English word e , calculate the cross-language word similarity between the Chinese word and the Japanese word as in equation (5) c f  English word e. For the ambiguous English word e, both the Chinese word c and the Japanese word f can be translated into e. The sense of an instance of the ambiguous English word e can be determined by the context in which the instance appears. Thus, the cross-language word similarity between the Chinese word c and the Japanese word f can be calculated according to the contexts of their English translation e. We use the feature vector constructed using the context words in the English sentence to represent the context. So we can calculate the cross-language word similarity using the feature vectors. The detailed algorithm is shown in figure 1. This idea is similar to translation lexicon extraction via a bridge language (Schafer and Yarowsky, 2002).</Paragraph>
      <Paragraph position="10"> For example, the Chinese word &amp;quot;He An &amp;quot; and its English translation &amp;quot;bank&amp;quot; (the border of a river) appears in the following Chinese-English sen- null tence pair: (a) Ta Men Yan Zhao He An Zou Hui Jia .</Paragraph>
      <Paragraph position="11"> (b) They walked home along the river bank.</Paragraph>
      <Paragraph position="12"> The Japanese word &amp;quot;Yin Xing &amp;quot; and its English translation &amp;quot;bank&amp;quot; (a financial organization) appears in the following English-Japanese sentence pair: (c) He has plenty of money in the bank.</Paragraph>
      <Paragraph position="13"> (d) Bi haYin Xing Yu Jin gaXiang Dang aru.</Paragraph>
      <Paragraph position="14">  The context words of the English word &amp;quot;bank&amp;quot; in sentences (b) and (c) are quite different. The difference indicates the cross language word similarity of the Chinese word &amp;quot;He An &amp;quot; and the Japanese word &amp;quot;Yin Xing &amp;quot; is low. So they tend to have different senses.</Paragraph>
      <Paragraph position="15"> Translation Probability Embedded with Cross</Paragraph>
    </Section>
    <Section position="2" start_page="876" end_page="876" type="sub_section">
      <SectionTitle>
Language Word Similarity
</SectionTitle>
      <Paragraph position="0"> Based on the cross language word similarity calculation in equation (5), we re-estimate the translation probability as shown in (6). Then we normalize it in equation (7).</Paragraph>
      <Paragraph position="1"> The word similarity of the Chinese word &amp;quot;He An (bank2)&amp;quot; and the Japanese word &amp;quot; Yin Xing (bank1)&amp;quot; given the word English word &amp;quot;bank&amp;quot; is low. Thus, using the updated estimation method, the translation probability of &amp;quot; Yin Xing (bank1)&amp;quot; given &amp;quot;He An (bank2)&amp;quot; becomes low.</Paragraph>
      <Paragraph position="3"/>
    </Section>
    <Section position="3" start_page="876" end_page="877" type="sub_section">
      <SectionTitle>
4.2 Fertility Probability
</SectionTitle>
      <Paragraph position="0"> The induced fertility probability is calculated as shown in (8). Here, we assume that the probabil-</Paragraph>
    </Section>
    <Section position="4" start_page="877" end_page="877" type="sub_section">
      <SectionTitle>
4.3 Distortion Probability in Model 3
</SectionTitle>
      <Paragraph position="0"> With the English language as a pivot language, we calculate the distortion probability of model 3.</Paragraph>
      <Paragraph position="1"> For this probability, we introduce two additional parameters: one is the position of English word and the other is the length of English sentence.</Paragraph>
      <Paragraph position="2"> The distortion probability is estimated as shown in (9).</Paragraph>
      <Paragraph position="3">  Where, is the estimated distortion probability. is the introduced position of an English word. n is the introduced length of an English sentence.</Paragraph>
      <Paragraph position="5"> In the above equation, we assume that the position probability is independent of the position of the Chinese word and the length of the Chinese sentence. And we assume that the position probability is independent of the length of Japanese sentence. Thus, we rewrite these two probabilities as follows. null</Paragraph>
      <Paragraph position="7"> For the length probability, the English sentence length n is independent of the word positions i . And we assume that it is uniformly distributed. Thus, we take it as a constant, and re-write it as follows.</Paragraph>
      <Paragraph position="8"> constant),|Pr(),,|Pr( == mlnmlin According to the above three assumptions, we ignore the length probability . Equa-</Paragraph>
    </Section>
    <Section position="5" start_page="877" end_page="877" type="sub_section">
      <SectionTitle>
4.4 Distortion Probability in Model 4
</SectionTitle>
      <Paragraph position="0"> In model 4, there are two parameters for the distortion probability: one for head words and the other for non-head words.</Paragraph>
      <Paragraph position="1"> Distortion Probability for Head Words The distortion probability for head words represents the relative position of the head word of the i</Paragraph>
      <Paragraph position="3"> cept and the center of the (i-1)</Paragraph>
      <Paragraph position="5"> jj[?], then is independent of the absolute position. Thus, we estimate the distortion probability by introducing another rela-</Paragraph>
      <Paragraph position="7"> jjd[?]is the estimated distortion probability for head words in Chinese-Japanese alignment. is the distortion probability for head word in Chinese-English</Paragraph>
      <Paragraph position="9"> ability of relative Japanese position given relative English position.</Paragraph>
      <Paragraph position="10"> In order to simplify , we introduce and and let  The English word in position is aligned to the Japanese word in position , and the English word in position is aligned to the Japanese word in position .</Paragraph>
      <Paragraph position="11">  We assume that and are independent, only depends on , and only depends on . Then can be estimated as shown in (13).</Paragraph>
      <Paragraph position="12">  Both of the two parameters in (13) represent the position translation probabilities. Thus, we can estimate them from the distortion probability in model 3. is estimated as shown in (14). And can be estimated in the same way. In (14), we also assume that the sentence length distribution is independent of the word position and that it is uniformly distributed.</Paragraph>
      <Paragraph position="13">  The distortion probability describes the distribution of the relative position of non-head words. In the same way, we introduce relative position of English words, and model the probability in (15).</Paragraph>
      <Paragraph position="14">  is the estimated distortion probability for the non-head words in Chinese-Japanese alignment. is the distortion probability for non-head words in Chinese-English alignment.</Paragraph>
      <Paragraph position="15">  jj DD is the translation probability of the relative Japanese position given the relative English position. In fact, has the same interpretation as in (12). Thus, we introduce two parameters and and let , where and are positions of English words. The final distortion probability for non-head words can be estimated as shown in (16).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="877" end_page="877" type="metho">
    <SectionTitle>
5 Interpolation Model
</SectionTitle>
    <Paragraph position="0"> With the Chinese-English and English-Japanese corpora, we can build the induced model for Chinese-Japanese word alignment as described in section 4. If we have small amounts of Chinese-Japanese corpora, we can build another word alignment model using the method described in section 3, which is called the original model here.</Paragraph>
    <Paragraph position="1"> In order to further improve the performance of Chinese-Japanese word alignment, we build an interpolated model by interpolating the induced model and the original model.</Paragraph>
    <Paragraph position="2"> Generally, we can interpolate the induced model and the original model as shown in equation (17).</Paragraph>
    <Paragraph position="3">  l is an interpolation weight. It can be a constant or a function of f and . c In both model 3 and model 4, there are mainly three kinds of parameters: translation probability, fertility probability and distortion probability. These three kinds of parameters have their own interpretation in these two models. In order to obtain fine-grained interpolation models, we interpolate the three kinds of parameters using different weights, which are obtained in the same way as described in Wu et al. (2005).</Paragraph>
    <Paragraph position="5"> sents the weights for translation probability.</Paragraph>
    <Paragraph position="6"> n l represents the weights for fertility probability.</Paragraph>
    <Paragraph position="8"> l represent the weights for distortion probability in model 3 and in model 4, respectively. null  l is set as the interpolation weight for both the head words and the non-head words. The above four weights are obtained using a manually annotated held-out set.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML