File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1404_intro.xml

Size: 11,032 bytes

Last Modified: 2025-10-06 14:01:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="W01-1404">
  <Title>Approximating Context-Free by Rational Transduction for Example-Based MT</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Preliminaries
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 hierarchical alignment
</SectionTitle>
      <Paragraph position="0"> The input to our algorithm is a corpus consisting of pairs of sentences related by an hierarchical alignment (Alshawi et al., 2000). In what follows, the formalization of this concept has been slightly changed with respect to the above reference, to suit our purposes in the remainder of this article.</Paragraph>
      <Paragraph position="1"> The hierarchically aligned sentence pairs in the corpus are 5-tuples a2a4a3a6a5a8a7a9a3a11a10a11a7a13a12a14a5a15a7a16a12a8a10a17a7a19a18a21a20 satisfying the following. The first two components, a3a6a5 and a3a11a10 , are strings, called the source string and the target string, respectively, the lengths of which are denoted by a22 a5a24a23 a25a3 a5a11a25 and a22 a10a26a23 a25a3 a10a27a25. We let a28 a5 and a28 a10 denote the sets of string positions  a10a11a34 respectively.</Paragraph>
      <Paragraph position="2"> Further, a12a14a5 (resp. a12a8a10 ) is a mapping from positions in a28 a5a37a36a38a29a11a39 a34 (resp.</Paragraph>
      <Paragraph position="3">  a10a40a36a41a29a11a39 a34 ) to pairs of lists of positions from a28 a5 (resp. a28 a10 ), satisfying the following: if a position a42 is mapped to a pair a2a44a43a45a5a8a7a32a43a46a10a14a20 , then the positions in the list a43a45a5a16a47a6a48</Paragraph>
      <Paragraph position="5"> in strictly increasing order; we let &amp;quot;a47 &amp;quot; denote listconcatenation, and a48a42a50a49 represents a list consisting of a single element a42 .</Paragraph>
      <Paragraph position="6"> Each position in a28 a5 (resp. a28 a10 ) should occur at most once in the image of a12a14a5 (resp. a12a8a10 ). This means that a12 a5 and a12 a10 assign dependency structures to the source and target strings.</Paragraph>
      <Paragraph position="7"> A further restriction on a12a14a5 and a12a8a10 requires some auxiliary definitions. Let a12 be either a12 a5 or a12a8a10 . We define a52a12 as the function that maps each position a42 to the list of positions a52a12a53a2a4a54a11a5a8a20a55a47  string a71 a5a53a47a32a47a32a47 a71a73a72 , and a43 is a list a48a54a17a5a8a7a32a31a32a31a32a31a33a7a19a54a8a57 a49 of string positions in a3 , then a3a53a74a75a43 represents the string</Paragraph>
      <Paragraph position="9"> resents the symbol a71a35a78 .</Paragraph>
      <Paragraph position="10"> We now say that a12 is projective if a52 a12 maps each position a42 to some interval of positions a48a79a27a7a19a79a81a80</Paragraph>
      <Paragraph position="12"> and a12a8a10 are projective. (Strictly speaking, our algorithm would still be applicable if they were not projective, but it would treat the hierarchical alignment as if the symbols in the source and target strings had been reordered to make a12a14a5 and a12 a10 projective.) Furthermore, a reasonable hierarchical alignment satisfies a52a12a53a2a44a39a11a20 a23 a48a39a11a7a32a30a11a7a32a31a32a31a32a31a35a7 a22a69a49 , where a22 a23 a22 a5 or a22 a23 a22 a10 when a12 a23 a12a14a5 or a12 a23 a12a8a10 , respectively, which means that all symbols in the string are indirectly linked to the 'dummy' position 0.</Paragraph>
      <Paragraph position="13"> Lastly, a18 is the union of a29a11a2a44a39a11a7a32a39a11a20 a34 and a subset of</Paragraph>
      <Paragraph position="15"> a10 that relates positions in the two strings.</Paragraph>
      <Paragraph position="16"> It is such that a2 a42 a5 a7a19a54a88a20a44a7a32a2 a42 a10 a7a19a54a88a20a90a89a91a18 imply a42 a5a92a23 a42 a10 and a2 a42 a7a19a54a11a5a32a20a44a7a32a2 a42 a7a19a54a15a10a66a20a92a89a93a18 imply a54a11a5 a23 a54a60a10 ; in other words, a position in one string is related to at most one position in the other. Furthermore, for each</Paragraph>
      <Paragraph position="18"> such that a42 occurs in one of the two lists of a12 a5 a2 a42a98a97 a20 and a54 occurs in one of the two lists of a12a8a10a11a2a4a54 a97 a20 ; this means that positions can only be related if their respective &amp;quot;mother&amp;quot; positions are related.</Paragraph>
      <Paragraph position="19"> Note that this paper does not discuss how hierarchical alignments can be obtained from unannotated corpora of bitexts. This is the subject of existing studies, such as (Alshawi et al., 2000).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 context-free transduction
</SectionTitle>
      <Paragraph position="0"> Context-free transduction was originally called syntax-directed transduction in (Lewis II and Stearns, 1968), but since in modern formal language theory and computational linguistics the term &amp;quot;syntax&amp;quot; has a much wider range of meanings than just &amp;quot;context-free syntax&amp;quot;, we will not use the original term here.</Paragraph>
      <Paragraph position="1"> A (context-free) transduction grammar is a 5tuple a2a44a100a90a7a32a101a37a5a8a7a32a101a102a10a11a7a32a103a104a7a32a105a104a20 , where a100 is a finite set of nonterminals, a105a38a89a38a100 is the start symbol, a101a37a5 and a101a102a10 are the source and target alphabets, and a103 is a finite set of productions of the form a106a96a107 a2a44a108a13a7a19a109a56a20 , where a106 a89a110a100 , a108a111a89a110a2a44a100a81a36a56a101a68a5a60a20 a0 anda109a112a89a110a2a44a100a81a36a64a101a102a10a11a20 a0 , such that each nonterminal in a108 occurs exactly once in a109 and each nonterminal in a109 occurs exactly once in a108 .1 If we were to replace each RHS pair by only its first part a108 , we would obtain a context-free grammar for the source language, and if we were to replace each RHS pair by its second part a109 , we would obtain a context-free grammar for the target language. The combination of the two halves of such a RHS indicates how a parse for 1Note that we ignore the case that a single nonterminal occurs twice or more in a113 or a114 ; if we were to include this case, some tedious complications of notation would result, without any theoretical gain such as an increase of generative power. We refer to (Lewis II and Stearns, 1968) for the general case.</Paragraph>
      <Paragraph position="2"> the source language can be related to a parse for the target language, and this defines a transduction between the languages in an obvious way.</Paragraph>
      <Paragraph position="3"> An example of a transduction grammar is:</Paragraph>
      <Paragraph position="5"> This transduction defines that a sentence &amp;quot;I like him&amp;quot; can be translated by &amp;quot;il me pla^it&amp;quot;.</Paragraph>
      <Paragraph position="6"> We can reduce the generative power of context-free transduction grammars by a syntactic restriction that corresponds to the bilexical context-free grammars (Eisner and Satta, 1999). Let us define a bilexical transduction grammar as a transduction grammar which is such that: a116 there is a mapping from the set of nonterminals to a101a37a5a14a86a117a101a102a10 . Due to this property, we may write each nonterminal as a106 a48a71 a7a32a118 a49 to indicate that it is mapped to the pair a2 a71 a7a32a118a8a20 , where  a89a26a101a37a5 and a118a61a89a26a101a102a10 , where a106 is a so called delexicalized nonterminal. We may write a105 as a106 a48a119a120a7a32a119 a49 , where a119 is a dummy symbol at the dummy string position a39 . Further, a116 each production is of one of the following five forms:  In the experiments in Section 6, we also consider nonterminals that are lexicalized only by the source alphabet, which means that these nonterminals can be written as a106 a48a71a132a49 , where a71 a89a24a101a37a5 . The motivation is to restrict the grammar size and to increase the coverage.</Paragraph>
      <Paragraph position="7"> Bilexical transduction grammars are equivalent to the dependency transduction model from (Alshawi et al., 2000).</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 obtaining a context-free transduction
</SectionTitle>
      <Paragraph position="0"> from the corpus We extract a context-free transduction grammar from a corpus of hierarchical alignments, by locally translating each hierarchical alignment into a set of productions. The union of all these sets for the whole corpus is then the transduction grammar. Counting the number of times that identical productions are generated allows us to assign probabilities to the productions by maximum likelihood estimation.</Paragraph>
      <Paragraph position="1"> We will consider a method that uses only one delexicalized nonterminal a106 . For a pair a2 a42 a7 a42a98a97 a20a55a89 a18 , we have a nonterminal</Paragraph>
      <Paragraph position="3"> a42a98a49 , depending on whether non-terminals are lexicalized by both source and target alphabets, or by just the source alphabet. Let us call that nonterminal a133a135a134a14a136a9a137 a2 a42 a7 a42a98a97 a20 . Each pair of positions a2 a42 a7 a42 a97 a20a138a89a139a18 gives rise to one production. Suppose that</Paragraph>
      <Paragraph position="5"> and each position in this pair is related by a18 to some position from a28 a10 , which we will call</Paragraph>
      <Paragraph position="7"> and each position in this pair is related by a18 to some position from a28 a5 , which we will call  Note that both halves of the RHS contain the same nonterminals but possibly in a different order. However, if any position in a12a14a5a15a2 a42 a20 or a12a8a10a11a2 a42a98a97 a20 is not related to some other position by a18 , then the production above contains, instead of a nonterminal, a substring on which that position is projected by a52a12a14a5 or a52a12a8a10 , respectively. E.g. if there is no position a54 a97a5 such that a2a4a54 a5 a7a19a54 a97a5 a20a122a89a143a18 , then instead of</Paragraph>
      <Paragraph position="9"> In general, we cannot adapt the above algorithm to produce transduction grammars that are bilexical. For example, a production of the form:  cannot be broken up into smaller, bilexical productions.2 However, the hierarchical alignments that we work with were produced by an algorithm that ensures that bilexical grammars suffice. Formally, this applies when the following cannot occur: there are a42 a7 a42 a5a8a7 a42 a10a147a89 a28 a5 and a54a73a7a19a54a11a5a32a7a19a54a60a10a147a89 a28 a10 such that a2 a42</Paragraph>
      <Paragraph position="11"> and a54a138a148a38a54a60a10a55a148a149a54a11a5 , or a42 a5a131a148 a42 a10a150a148 a42 and a54a75a148a149a54a11a5a131a148a147a54a60a10 , or a42 a148 a42 a5a131a148 a42 a10 and a54a17a5a131a148a149a54a15a10a150a148a151a54 .</Paragraph>
      <Paragraph position="12"> For example, if the non-bilexical production we would obtain is:</Paragraph>
      <Paragraph position="14"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML