File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-1019_metho.xml

Size: 18,400 bytes

Last Modified: 2025-10-06 14:10:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1019">
  <Title>A Comparison of Syntactically Motivated Word Alignment Spaces</Title>
  <Section position="3" start_page="0" end_page="146" type="metho">
    <SectionTitle>
2 Alignment Spaces
</SectionTitle>
    <Paragraph position="0"> Let an alignment be the entire structure that connects a sentence pair, and let a link be the individual word-to-word connections that make up an alignment. An alignment space determines the set of all possible alignments that can ex- null ist for a given sentence pair. Alignment spaces can emerge from generative stories (Brown et al., 1993), from syntactic notions (Wu, 1997), or they can be imposed to create competition between links (Melamed, 2000). They can generally be described in terms of how links interact.</Paragraph>
    <Paragraph position="1"> For the sake of describing the size of alignment spaces, we will assume that both sentences have n tokens. The largest alignment space for a sentence pair has 2n2 possible alignments. This describes the case where each of the n2 potential links can be either on or off with no restrictions.</Paragraph>
    <Section position="1" start_page="145" end_page="145" type="sub_section">
      <SectionTitle>
2.1 Permutation Space
</SectionTitle>
      <Paragraph position="0"> A straight-forward way to limit the space of possible alignments is to enforce a one-to-one constraint (Melamed, 2000). Under such a constraint, each token in the sentence pair can participate in at most one link. Each token in the English sentence picks a token from the Foreign sentence to link to, which is then removed from competition.</Paragraph>
      <Paragraph position="1"> This allows for n! possible alignments1, a substantial reduction from 2n2.</Paragraph>
      <Paragraph position="2"> Note that n! is also the number of possible permutations of the n tokens in either one of the two sentences. Permutation space enforces theone-to-one constraint, but allows anyreordering of tokens as they are translated. Permutation space methods include weighted maximum matching (Taskar et al., 2005), and approximations to maximum matching like competitive linking (Melamed, 2000). The IBM models (Brown et al., 1993) search a version of permutation space with a one-to-many constraint.</Paragraph>
    </Section>
    <Section position="2" start_page="145" end_page="145" type="sub_section">
      <SectionTitle>
2.2 ITG Space
</SectionTitle>
      <Paragraph position="0"> Inversion Transduction Grammars, or ITGs (Wu, 1997) provide an efficient formalism to synchronously parse bitext. Thisproduces aparse tree that decomposes both sentences and also implies a word alignment. ITGs are transduction grammars because their terminal symbols can produce tokens in both the English and Foreign sentences.</Paragraph>
      <Paragraph position="1"> Inversions occur when the order of constituents is reversed in one of the two sentences.</Paragraph>
      <Paragraph position="2"> In this paper, we consider the alignment space induced by parsing with a binary bracketing ITG, such as: A - [AA] |&lt;AA&gt;  |e/f (1) 1This is a simplification that ignores null links. The actual number of possible alignments lies between n! and (n+1)n. The terminal symbol e/f represents tokens output to the English and Foreign sentences respectively. Square brackets indicate a straight combination of non-terminals, while angle brackets indicate an inverted combination: &lt;A1A2&gt; means that A1A2 appears in the English sentence, whileA2A1 appears in the Foreign sentence.</Paragraph>
      <Paragraph position="3"> Used as a word aligner, an ITG parser searches a subspace of permutation space: the ITG requires that any movement that occurs during translation be explained by a binary tree with inversions.</Paragraph>
      <Paragraph position="4"> Alignments that allow no phrases to be formed in bitext are not attempted. This results in two forbidden alignment structures, shown in Figure 1, called &amp;quot;inside-out&amp;quot; transpositions in (Wu, 1997). Note that no pair of contiguous tokens in the top</Paragraph>
      <Paragraph position="6"> sentence remain contiguous when projected onto the bottom sentence. Zens and Ney (2003) explore the re-orderings allowed by ITGs, and provide a formulation for the number of structures that can be built for a sentence pair of size n. ITGs explore almost all of permutation space when n is small, but their coverage of permutation space falls off quickly for n &gt; 5 (Wu, 1997).</Paragraph>
    </Section>
    <Section position="3" start_page="145" end_page="146" type="sub_section">
      <SectionTitle>
2.3 Dependency Space
</SectionTitle>
      <Paragraph position="0"> Dependency space defines the set of all alignments that maintain phrasal cohesion with respect to a dependency tree provided for the English sentence. The space is constrained so that the phrases in the dependency tree always move together.</Paragraph>
      <Paragraph position="1"> Fox (2002) introduced the notion of head-modifier and modifier-modifier crossings. These occur when a phrase's image in the Foreign sentence overlaps with theimage of itshead, or one of its siblings. An alignment with no crossings maintains phrasal cohesion. Figure 2 shows a headmodifiercrossing: theimagecofahead 2overlaps with the image (b,d) of 2's modifier, (3,4). Lin</Paragraph>
      <Paragraph position="3"> sion to constrain a beam search aligner, conducting a heuristic search of the dependency space.</Paragraph>
      <Paragraph position="4"> The number of alignments in dependency space depends largely on the provided dependency tree.</Paragraph>
      <Paragraph position="5"> Because all permutations of a head and its modifiers are possible, a tree that has a single head with n [?] 1 modifiers provides no guidance; the alignment space is the same as permutation space. If the tree is a chain (where every head has exactly one modifier), alignment space has only 2n permutations, which is by far the smallest space we have seen. In general, there are producttextth [(mth + 1)!] permutations for a given tree, where th stands for a head node in thetree, andmth counts th'smodifiers.</Paragraph>
      <Paragraph position="6"> Dependency space is not a subspace of ITG space, as it can create both the forbidden alignments in Figure 1 when given a single-headed tree.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="146" end_page="149" type="metho">
    <SectionTitle>
3 Dependency constrained ITG
</SectionTitle>
    <Paragraph position="0"> In this section, we introduce a new alignment space defined by a dependency constrained ITG, or D-ITG. The set of possible alignments in this space is the intersection of the dependency space for a given dependency tree and ITG space. Our goal is an alignment search that respects the phrases specified by the dependency tree, but attempts all ITG orderings of those phrases, rather than all permutations. The intuition is that most ordering decisions involve only a small number of phrases, so the search should still cover a large portion of dependency space.</Paragraph>
    <Paragraph position="1"> This new space has several attractive computational properties. Since it is a subspace of ITG space, we will be able to search the space completely using a polynomial time ITG parser. This places an upper bound on the search complexity equal to ITG complexity. This upper bound is very loose, as the ITG will often be drastically constrained by the phrasal structure of the dependency tree. Also, by working in the ITG framework, we will be able to take advantage of advances in ITG parsing, and we will have access to the forward-backward algorithm to implicitly count events over all alignments.</Paragraph>
    <Section position="1" start_page="146" end_page="146" type="sub_section">
      <SectionTitle>
3.1 A simple solution
</SectionTitle>
      <Paragraph position="0"> Wu (1997) suggests that in order to have an ITG take advantage of a known partial structure, one can simply stop the parser from using any spans that would violate the structure. In a chart parsing framework, this can be accomplished by assigning the invalid spans a value of [?][?] before parsing begins. Our English dependency tree qualifies as a partial structure, as it does not specify a complete binary decomposition of the English sentence. In this case, any ITG span that would contain part, but not all, of two adjacent dependency phrases can be invalidated. The sentence pair can then be parsed normally, automatically respecting phrases specified by the dependency tree.</Paragraph>
      <Paragraph position="1"> For example, Figure 3a shows an alignment for the sentence pair, &amp;quot;His house in Canada, Sa maison au Canada&amp;quot; and the dependency tree provided for the English sentence. The spans disallowed by the tree are shown using underlines. Note that the illegal spans are those that would break up the &amp;quot;in Canada&amp;quot; subtree. After invalidating these spans in the chart, parsing the sentence pair with the bracketing ITG in (1) will produce the two structures shown in Figure 3b, both of which correspond to the correct alignment.</Paragraph>
      <Paragraph position="2"> This solution is sufficient to create a D-ITG that obeys the phrase structure specified by a dependency tree. This allows us to conduct a complete search of a well-defined subspace of the dependency space described in Section 2.3.</Paragraph>
    </Section>
    <Section position="2" start_page="146" end_page="148" type="sub_section">
      <SectionTitle>
3.2 Avoiding redundant derivations with a
</SectionTitle>
      <Paragraph position="0"> recursive ITG The above solution can derive two structures for the same alignment. It is often desirable to eliminate redundant structures when working with ITGs. Having a single, canonical tree structure for each possible alignment can help when flattening binary trees, as it indicates arbitrary binarization decisions (Wu, 1997). Canonical structures also eliminate double counting when performing tasks like EM (Zhang and Gildea, 2004). The nature of null link handling in ITGs makes eliminating all redundancies difficult, but we can at least eliminate them in the absence of nulls.</Paragraph>
      <Paragraph position="1"> Normally, one would eliminate the redundant structures produced by the grammar in (1) by replacing it with the canonical form grammar (Wu, 1997), which has the following form:</Paragraph>
      <Paragraph position="3"/>
      <Paragraph position="5"> recursion to specific inversion combinations.</Paragraph>
      <Paragraph position="6"> The canonical structure for a given alignment is fixed by this grammar, without awareness of the dependency tree. When the dependency tree invalidates spans that are used in canonical structures, the parser will miss the corresponding alignments.</Paragraph>
      <Paragraph position="7"> The canonical structure corresponding to the correct alignment in our running example is shown in Figure 3c. This structure requires the underlined invalid span, so the canonical grammar fails to produce the correct alignment. Our task requires a new canonical grammar that isawareofthe dependency tree, and will choose among valid structures deterministically.</Paragraph>
      <Paragraph position="8"> Our ultimate goal is to fall back to ITG re-ordering when the dependency tree provides no guidance. We can implement this notion directly with a recursive ITG. Let a local tree be the tree formed by a head node and its immediate modifiers. We begin our recursive process by considering the local tree at the root of our dependency tree, and marking each phrasal modifier with a labeled placeholder. We then create a string by flattening the local tree. The top oval of Figure 4 shows the result of this operation on our running example. Because all phrases have been collapsed to placeholders, an ITG built over this string will naturally respect the dependency tree's phrasal boundaries. Since we do not need to invalidate any spans, we can parse this string using the canonical ITG in (2). The phrasal modifiers can in turn be processed by applying the same algorithm recursively to their root nodes, as shown in the lower oval of Figure 4. This algorithm will explore the exact same alignment space as the solution presented in Section 3.1, but because it uses acanonical ITGateveryordering decision point, it will produce exactly one structure for each alignment. Returning to our running example, the algorithm will produce the left structure of Figure 3b. Thisrecursive approach can be implemented inside a traditional ITG framework using grammar templates. The templates take the form of whatever grammar will be used to permute the local trees. They are instantiated over each local tree before ITG parsing begins. Each instantiation has its non-terminals marked with its corresponding span, and its pre-terminal productions are customized to match the modifiers of the local tree.</Paragraph>
      <Paragraph position="9"> Phrasal modifiers point to another instantiation of thetemplate. Inourcase, thetemplatecorresponds to the canonical form grammar in (2). The result of applying the templates to our running example is:</Paragraph>
      <Paragraph position="11"> Recursive ITGsand grammar templates provide a conceptual framework to easily transfer grammars for flat sentence pairs to situations with fixed phrasal structure. We have used the framework here to ensure only one structure is constructed for each possible alignment. We feel that this recursive view of the solution also makes it easier to visualize the space that the D-ITG is searching.</Paragraph>
      <Paragraph position="12"> It is trying all ITG orderings of each head and its modifiers.</Paragraph>
      <Paragraph position="14"/>
    </Section>
    <Section position="3" start_page="148" end_page="149" type="sub_section">
      <SectionTitle>
3.3 Head constrained ITG
</SectionTitle>
      <Paragraph position="0"> D-ITGs can construct ITG structures that do not completely agree with the provided dependency tree. If a head in the dependency tree has more than one modifier on one of its sides, then those modifiers may form a phrase in the ITG that should not exist according to the dependency tree.</Paragraph>
      <Paragraph position="1"> For example, the ITG structure shown in Figure 5 will be considered by our D-ITG as it searches alignment space. The resulting &amp;quot;here quickly&amp;quot; subtree disagrees with our provided dependency tree, which specifies that &amp;quot;ran&amp;quot; ismodified byeach word individually, and not by a phrasal concept that includes both. This is allowed by the parser because we have made the ITG aware of the dependency tree's phrasal structure, but it still has no notion of heads or modifiers. It is possible that by constraining our ITG according to this additional syntactic information, we can provide further guidance to our alignment search.</Paragraph>
      <Paragraph position="2"> The simplest way to eliminate these modifier combinations is to parse with the redundant bracketing grammar in (1), and to add another set of invalid spans to the set described in Section 3.1.</Paragraph>
      <Paragraph position="3"> These new invalidated chart entries eliminate all spans that include two or more modifiers without their head. With this solution, the structure in Figure 5 is no longer possible. Unfortunately, the grammar allows multiple structures for each alignment: to represent an alignment with no inversions, this grammar will produce all three structures shown in Figure 6.</Paragraph>
      <Paragraph position="4"> If we can develop a grammar that will produce canonical head-aware structures for local trees, we can easily extend it to complete dependency trees using the concept of recursive ITGs. Such a grammar requires a notion of head, so we can ensure that every binary production involves the head or a phrase containing the head. A redundant, head-aware grammar is shown here:</Paragraph>
      <Paragraph position="6"> Note that two modifiers can never be combined without also including the A symbol, which always contains the head. This grammar still considers all the structures shown in Figure 6, but it requires no chart preprocessing.</Paragraph>
      <Paragraph position="7"> We can create a redundancy-free grammar by expanding (3). Inspired by Wu's canonical form grammar, we will restrict the productions so that certain structures are formed only when needed for specific inversion combinations. Tospecify the necessary inversion combinations, our ITG will need more expressive non-terminals. Split A into two non-terminals, L and R, to represent generators for left modifiers and right modifiers respectively. Then split L into -L and ^L, for generators that produce straight and inverted left modifiers.</Paragraph>
      <Paragraph position="8"> We now have a rich enough non-terminal set to design a grammar with a default behavior: it will generate all right modifiers deeper in the bracketing structure than all left modifiers. This rule is broken only to create a re-ordering that is not possible with the default structure, such as [&lt;MH&gt; M]. A grammar that accomplishes this goal is shown here:</Paragraph>
      <Paragraph position="10"> This grammar will generate one structure for each alignment. In the case of an alignment with no inversions, it will produce the tree shown in Figure6c. Thegrammarcanbeexpanded intoarecursive ITG by following a process similar to the one explained in Section 3.2, using (4) as a template.</Paragraph>
      <Paragraph position="11"> 3.3.1 The head-constrained alignment space Because we have limited the ITG's ability to combine them, modifiers of the same head can no longer occur at the same level of any ITG tree.</Paragraph>
      <Paragraph position="12"> In Figure 6, we see that in all three valid structures, &amp;quot;quickly&amp;quot; is attached higher in the tree than &amp;quot;here&amp;quot;. As a result of this, no combination of inversions can bring &amp;quot;quickly&amp;quot; between &amp;quot;here&amp;quot; and &amp;quot;ran&amp;quot;. In general, the alignment space searched by this ITG is constrained so that, among modifiers, relative distance from head is maintained.</Paragraph>
      <Paragraph position="13"> More formally, let Mi and Mo be modifiers of H such that Mi appears between Mo and H in the dependency tree. No alignment will ever place the</Paragraph>
      <Paragraph position="15"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML