XML Viewer - c04-1090

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1090_metho.xml
Size: 17,987 bytes
Last Modified: 2025-10-06 14:08:40
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1090">
  <Title>A Path-based Transfer Model for Machine Translation</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Acquisition of Transfer Rules
</SectionTitle>
    <Paragraph position="0"> A transfer rule specifies how a path in the source language dependency tree is translated. We extract transfer rules automatically from a word-aligned corpus. For example, Fig. 2(b-g) are some of the rules extracted from the word aligned sentence in Fig. 2(a). The left hand side of a rule is a path in the source dependency tree. The right hand side of a rule is a fragment of a dependency tree in the target language. It encodes not only the dependency relations, but also the relative linear order among the nodes in the fragment. For example, the rule in Fig. 2(e) specifies that when the path Connect- to-controller is translated into French Branchez precedes (but not necessarily adjacent to) sur, and sur precedes (but not necessarily adjacent to) controleur.</Paragraph>
    <Paragraph position="1"> Note that the transfer rules also contain word-to-word mapping between the nodes in the source and the target (obtained from word alignments). These mappings are not shown in order not to clutter the diagrams.</Paragraph>
    <Paragraph position="2"> Connect cables to controller Branchez les cables sur controleur Connect to controller Branchez sur controleur Connect cables Branchez les cables power cables cables d' alimentation both cables deux cables Connect both power cables to the controller Branchez les deux cables d' alimentation sur le controleur</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Spans
</SectionTitle>
      <Paragraph position="0"> The rule extraction algorithm makes use of the notion of spans (Fox 2002, Lin&amp;Cherry 2003).</Paragraph>
      <Paragraph position="1"> Given a word alignment and a node n in the source dependency tree, the spans of n induced by the word alignment are consecutive sequences of words in the target sentence. We define two types of spans: Head span: the word sequence aligned with the node n.</Paragraph>
      <Paragraph position="2"> Phrase span: the word sequence from the lower bound of the head spans of all nodes in the subtree rooted at n to the upper bound of the same set of spans.</Paragraph>
      <Paragraph position="3"> For example, the spans of the nodes in Fig. 2(a) are listed in Table 1. We used the word-alignment algorithm in (Lin&amp;Cherry 2003a), which enforces a cohesion constraint that guarantees that if two spans overlap one must be fully contained in the other.</Paragraph>
      <Paragraph position="4">  Node Head Span Phrase Span Connect [1,1] [1,9] both [3,3] [3,3] power [6,6] [6,6] cables [4,4] [3,6] to [8,9] the [8,8] [8,8] controller [9,9] [8,9]</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Rule-Extraction Algorithm
</SectionTitle>
      <Paragraph position="0"> For each word-aligned dependency tree in the training corpus, we extract all the paths where all the nodes are aligned with words in the target language sentence, except that a preposition in the middle of a path is allowed to be unaligned. In the dependency tree in Fig. 2(a), we can extract 21 such paths, 6 of which are single nodes (degenerated paths).</Paragraph>
      <Paragraph position="1"> We first consider the translation of simple paths which are either a single link or a chain of two links with the middle node being an unaligned preposition. An example of the latter case is the path Connect-to-controller in Fig. 2(a). In such cases, we treat the two dependency link as if it is a single link (e.g., we call &amp;quot;Connect&amp;quot; the parent of</Paragraph>
      <Paragraph position="3"> is a simple path from node h to node m. Let h' and m' be target language words aligned with h and m respectively. Let s be the phrase span of a sibling of m that is located in between h' and m' and is the closest to m' among all such phrase spans. If m does not have such a sibling, let s be the head span of h.</Paragraph>
      <Paragraph position="5"> consists of the following nodes and links: * Two nodes labeled h' and m', and a link from h' to m'.</Paragraph>
      <Paragraph position="6"> * A node corresponding to each word between s and the phrase span of m and a link from each of these nodes to m'.</Paragraph>
      <Paragraph position="7"> Fig. 2(b-e) are example translations constructed this way. The following table lists the words h' and m' and the span s in these instances:  Example h' m' s  In general, a path is either a single node, or a simple path, or a chain of simple paths. The translations of single nodes are determined by the word alignments. The translation of a chain of simple paths can be obtained by chaining the translations of the simple paths. Fig. 2(f) provides an example.</Paragraph>
      <Paragraph position="8"> Note that even though the target of a rule is typically a path, it is not necessarily the case (e.g., Fig. 2(g)). Our rule extraction algorithm guarantees the following property of target tree fragments: if a node in a target tree fragment is not aligned with a node in the source path, it must be a leaf node in the tree fragment.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Generalization of Rules
</SectionTitle>
      <Paragraph position="0"> In addition to the rules discussed the in the previous subsection, we also generalize the rules by replacing one of the end nodes in the path with a wild card and the part of speech of the word. For example the rule in Fig. 2(b) can be generalized in two ways. The generalized versions of the rule apply to any determiner modifying cable and both modifying any noun, respectively.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Translation Probability
Let S
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> be a path in the source language dependency</Paragraph>
      <Paragraph position="4"> be a tree fragment in the target language. The translation probability P(T</Paragraph>
      <Paragraph position="6"> , and M is a smoothing constant.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="2" type="metho">
    <SectionTitle>
4 Path-based Translation
</SectionTitle>
    <Paragraph position="0"> Given a source language sentence, it is translated into the target language in the following steps: Step 1: Parse the sentence to obtain its dependency structure.</Paragraph>
    <Paragraph position="1">  Step 2: Extract all the paths in the dependency tree and retrieve the translations of all the paths. Step 3: Find a set of transfer rules such that a) They cover the whole dependency tree.</Paragraph>
    <Paragraph position="2"> b) The tree fragments in the rules can be consistently merged into a target language dependency tree.</Paragraph>
    <Paragraph position="3"> c) The merged tree has the highest probability among all the trees satisfying the above conditions.</Paragraph>
    <Paragraph position="4"> Step 4: Output the linear sequence of words in the dependency tree.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Merging Tree Fragments
</SectionTitle>
      <Paragraph position="0"> In Step 3 of our algorithm, we need to merge the tree fragments obtained from a set of transfer rules into a single dependency tree. For example, the mergers of target tree fragments in Fig. 4(b-d) result in the tree in Fig. 4(e). Since the paths in these rules cover the dependency tree in Fig. 4(a), Fig. 4(e) is a translation of Fig. 4(a). The merger of target tree fragments is constrained by the fact that if two target nodes in different fragments are mapped to the same source node, they must be merged into a single node.</Paragraph>
      <Paragraph position="1"> Proposition 1: The merger of two target tree fragments does not contain a loop.</Paragraph>
      <Paragraph position="2"> Proof: The unaligned nodes in each tree fragment will not be merged with another node. They have degree 1 in the original tree fragment and will still have degree 1 after the merger. If there is a loop in the merged graph, the degree of a node on the loop is at least 2. Therefore, all of the nodes on the loop are aligned nodes. This implies that there is a loop in the source dependency tree, which is clearly false.</Paragraph>
      <Paragraph position="3">  transfer rules cover the input dependency tree, the merger of the right hand side of the rules is a tree. Proof: To prove it is a tree, we only need to prove that it is connected since Proposition 1 guarantees that there is no loop. Consider the condition part of a rule, which is a path A in the source dependency tree. Let r be the node in the path that is closest to the root node of the tree. If r is not the root node of the tree, there must exist another path B that covers the link between r and its parent. The paths A and B map r to the same target language node.</Paragraph>
      <Paragraph position="4"> Therefore, the target language tree fragments for A and B are connected. Using mathematical induction, we can establish that all the tree fragments are connected.</Paragraph>
      <Paragraph position="5"> The above two propositions establish the fact that the merge the tree fragments form a tree structure.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="2" type="sub_section">
      <SectionTitle>
4.2 Node Ordering
</SectionTitle>
      <Paragraph position="0"> For each node in the merged structure, we must also determine the ordering of among it and its children. If a node is present in only one of the original tree fragments, the ordering between it and its children will be the same as the tree fragment.</Paragraph>
      <Paragraph position="1"> Suppose a node h is found in two tree fragments.</Paragraph>
      <Paragraph position="2"> For the children of h that come from the same fragment, their order is already specified. If two  are on different sides of h in their original fragments, their order can be inferred from their positions relative to h. For example, the combination of the rules in Fig. 4(b) and Fig. 4(c) translate both existing cables into deux cables existants.</Paragraph>
      <Paragraph position="3">  are on the same side of h and their source language counterparts are also on the same side of h, we maintain their relative closeness to the parent nodes: whichever word was closer to the parent in the source remains to be closer to the parent in the target. For example, the combination of the rules in Fig. 4(c) and Fig. 4(d) translates existing coaxial cables into cables coaxiaux existants.</Paragraph>
      <Paragraph position="4">  are on the same side of h but their source language counterpart are on different sides of h, we will use the word order of their original in the source language.</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.3 Conflicts in Merger
</SectionTitle>
      <Paragraph position="0"> Conflicts may arise when we merge tree fragments.</Paragraph>
      <Paragraph position="1"> Consider the two rules in Fig. 5. The rule in Fig.</Paragraph>
      <Paragraph position="2"> 5(a) states that when the word same is used to modify a noun, it is translated as meme and appears after the noun. The rule in Fig. 5(b) states that same physical geometry is translated into geometrie physique identique. When translating the sentence in Fig. 5(c), both of these rules can be applied to parts of the tree. However, they cannot be used at the same time as they translate same to different words and place them on different location.</Paragraph>
    </Section>
    <Section position="4" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.4 Probabilistic Model
</SectionTitle>
      <Paragraph position="0"> Our translation model is a direct translation model as opposed to a noisy channel model which is commonly employed in statistical machine translation. Given the dependency tree S of a source language sentence, the probability of the target dependency tree T, P(T|S), is computed by decomposing it into a set of path translations:  Note that the paths in C are allowed to overlap. However, no path should be totally contained in another, as we can always remove the shorter path to increase the probability without compromising the total coverage of C.</Paragraph>
    </Section>
    <Section position="5" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.5 Graph-theoretic Formulation
</SectionTitle>
      <Paragraph position="0"> If we ignore the conflict in merging tree fragments and assign the weight -log P(T</Paragraph>
      <Paragraph position="2"> the problem of finding the most probable translation can be formulated as the following graph theory problem: Given a tree and a collection of paths in the tree where each path is assigned a weight. Find a subset of the paths such that they cover all the nodes and edges in the tree and have the minimum total weight.</Paragraph>
      <Paragraph position="3"> We call this problem the Minimum Path Covering of Trees. A closely related problem is the Minimum Set Covering Problem: Given a collection F of subset set of a given set X, find a minimum-cardinality subcollection C of F such that the union of the subsets in C is X. Somewhat surprisingly, while the Minimum Set Covering Problem is a very well-known NP-Complete problem, the problem of Minimum Path Covering of Trees has not previously been studied. It is still an open problem whether this problem is NP-Complete or has a polynomial solution.</Paragraph>
      <Paragraph position="4"> If we assume that the number of paths covering any particular node is bounded by a constant, there exists a dynamic programming algorithm with O(n) complexity where n is the size of the tree (Lin&amp;Lin, 2004). In the machine translation, this seems to be a reasonable assumption.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="2" end_page="2" type="metho">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> We implemented a path-based English-to-French MT system. The training corpus consists of the English-French portion of the 1999 European</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
Parliament Proceedings
</SectionTitle>
      <Paragraph position="0"> (Koehn 2002). It consists of 116,889 pairs of sentences (3.4 million words). As in (Koehn, et. al. 2003), 1755 sentences of length 5-15 were used for testing. We parsed the English side of the corpus with Minipar  (Lin 2002). We then performed word-align on the parsed corpus with the ProAlign system (Cherry&amp;Lin 2003, Lin&amp;Cherry 2003b). From the training corpus, we extracted 2,040,565 distinct paths with one or more translations. The BLEU score of our system on the test data is 0.2612. Compared with the English to French results in (Koehn et. al. 2003), this is higher than the IBM Model 4 (0.2555), but lower than the phrasal model (0.3149).</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="2" end_page="2" type="metho">
    <SectionTitle>
6 Related Work and Discussions
6.1 Transfer-based MT
</SectionTitle>
    <Paragraph position="0"> Both our system and transfer-based MT systems take a parse tree in the source language and translate it into a parse tree in the target language with transfer rules. There have been many recent proposals to acquire transfer rules automatically from word-aligned corpus (Carbonell et al 2002, Lavoie et al 2002, Richardson et al 2001). There are two main differences between our system and previous transfer-based approach: the unit of transfer and the generation module.</Paragraph>
    <Paragraph position="1"> The units of transfer in previous transfer based approach are usually subtrees in the source  language parse tree. While the number of subtrees of a tree is exponential in the size of the tree, the number of paths in a tree is quadratic. The reduced number of possible transfer units makes the data less sparse.</Paragraph>
    <Paragraph position="2"> The target parse tree in a transfer-based system typically does not include word order information. A separate generation module, which often involves some target language grammar rules, is used to linearize the words in the target parse tree. In contrast, our transfer rules specify linear order among nodes in the rule. The ordering among nodes in different rules is determined with a couple of simply heuristics. There is no separate generation module and we do not need a target language grammar.</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
6.2 Translational Divergence
</SectionTitle>
      <Paragraph position="0"> The Direct Correspondence Assumption (DCA) states that the dependency tree in source and target language have isomorphic structures (Hwa et. al.</Paragraph>
      <Paragraph position="1"> 2002). DCA is often violated in the presence of translational divergence. It has been shown in (Habash&amp;Dorr 2002) that translational divergences are quite common (as much as 35% between English and Spanish). For example, Fig. 6(a) is a Head Swapping Divergence.</Paragraph>
      <Paragraph position="2"> Even though we map the dependency tree in the source language into a dependency tree in the target language, we are using a weaker assumption than DCA. We induce a target language structure using a source language structure and the word alignment. There is no guarantee that this target language dependency tree is what a target language linguist would construct. For example, derived dependency tree for &amp;quot;X cruzar Y nadando&amp;quot; is shown in Fig. 6(b). Even though it is not a correct dependency tree for Spanish, it does generate the correct word order.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML