File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-1031_metho.xml
Size: 14,862 bytes
Last Modified: 2025-10-06 14:10:10
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1031"> <Title>Relabeling Syntax Trees to Improve Syntax-Based Machine Translation Quality</Title> <Section position="3" start_page="240" end_page="242" type="metho"> <SectionTitle> 2 Experimental Framework </SectionTitle> <Paragraph position="0"> Our training data consists of 164M+167M words of parallel Chinese/English text. The English half was parsed with a reimplementation of Collins' Model 2 (Collins, 1999) and the two halves were word-aligned using GIZA++ (Och and Ney, 2000). These three components Chinese strings, English parse trees, and their word alignments were inputs to our experimental procedure, which involved ve steps: (1) tree relabeling, (2) rule extraction, (3) decoding, (4) n-best reranking, (5) evaluation.</Paragraph> <Paragraph position="1"> This paper focuses on step 1, in which the original English parse trees are transformed by one or more relabeling strategies. Step 2 involves extracting minimal xRS rules (Galley et al., 2004) from the set of string/tree/alignments triplets. These rules are then used in a CKY-type parser-decoder to translate the 878-sentence 2002 NIST MT evaluation test set (step 3). In step 4, the output 2,500-sentence n-best list is reranked using an n-gram language model trained on 800M words of English news text. In the nal step, we score our translations with 4-gram BLEU (Papineni et al., 2002).</Paragraph> <Paragraph position="2"> Separately for each relabeling method, we ran these ve steps and compared the resulting BLEU score with that of a baseline system with no relabeling. To determine if a BLEU score increase or decrease is meaningful, we calculate statistical signi cance at 95% using paired bootstrap resampling (Koehn, 2004; Zhang et al., 2004) on 1,000 samples.</Paragraph> <Paragraph position="3"> Figure 3 shows the results from each relabeling experiment. The second column indicates the change in the number of unique rules from the base-line number of 16.7M rules. The third column gives the BLEU score along with an indication whether it is a statistically signi cant increase (a76), a statistically signi cant decrease (a77), or neither (?) over the baseline BLEU score.</Paragraph> <Paragraph position="4"> the impact on ruleset size and BLEU score over the baseline.</Paragraph> </Section> <Section position="4" start_page="242" end_page="245" type="metho"> <SectionTitle> 3 Relabeling </SectionTitle> <Paragraph position="0"> The small tagset of the PTB has the advantage of being simple to annotate and to parse. On the other hand, this can lead to tags that are overly generic.</Paragraph> <Paragraph position="1"> Klein and Manning (2003) discuss this as a problem in parsing and demonstrate that annotating additional information onto the PTB tags leads to improved parsing performance. We similarly propose methods of relabeling PTB trees that notably improve MT quality. In the next two subsections, we explore relabeling strategies that fall under two categories introduced by Klein and Manning internal annotation and external annotation.</Paragraph> <Section position="1" start_page="242" end_page="243" type="sub_section"> <SectionTitle> 3.1 Internal Annotation </SectionTitle> <Paragraph position="0"> Internal annotation reveals information about a node and its descendants to its surrounding nodes (ancestors, sisters, and other relatives) that is otherwise hidden. This is paramount in MT because the contents of a node must be understood before the node can be reliably translated and positioned in a sentence. Here we discuss two such strategies: lexi- null Many state-of-the-art statistical parsers incorporate lexicalization to effectively capture wordspeci c behavior, which has proved helpful in our system as well. We generalize lexicalization to allow a lexical item (terminal word) to be annotated onto any ancestor label, not only its parent.</Paragraph> <Paragraph position="1"> Let us revisit the determiner/noun number disagreement problem in Figure 2 (*this Turkish positions). If we lexicalize all DTs in the parse trees, the problematic DT is relabeled more speci cally as DT_this, as seen in rule 2prime(c) in Figure 4. This also produces rules like 4prime(c), where both the determiner and the noun are plural (notice the DT_these), and 4primeprime(c), where both are singular. With such a ruleset, 2prime(c) could only combine with 4primeprime(c), not 4prime(c), enforcing the grammatical output this Turkish position.</Paragraph> <Paragraph position="2"> We explored ve lexicalization strategies, each targeting a different grammatical category. A common translation mistake was the improper choice of prepositions, e.g., responsibility to attacks. Lexicalizing prepositions proved to be the most effective lexicalization method (LEX_PREP). We annotated a preposition onto both its parent (IN or TO) and its grandparent (PP) since the generic PP tag was often at fault. We tried lexicalizing all prepositions (variant 1), the top 15 most common prepositions (variant 2), and the top 5 most common (variant 3). All gave statistically signi cant BLEU improvements, especially variant 2.</Paragraph> <Paragraph position="3"> The second strategy was DT lexicalization (LEX_DT), which we encountered previously in Figure 4. This addresses two features of Chinese that are problematic in translation to English: the infrequent use of articles and the lack of overt number indicators on nouns. We lexicalized these determiners: the, a, an, this, that, these, or those, and grouped together those with similar grammatical distributions (a/an, this/that, and these/those). Variant 1 included all the determiners mentioned above and variant 2 was restricted to the and a/an to focus only on articles. The second slightly improved on the rst.</Paragraph> <Paragraph position="4"> The third type was auxiliary lexicalization (LEX_AUX), in which all forms of the verb be are annotated with _be, and similarly with do and have. The PTB purposely eliminated such distinctions; here we seek to recover them. However, auxiliaries and verbs function very differently and thus cannot be treated identically. Klein and Manning (2003) make a similar proposal but omit do.</Paragraph> <Paragraph position="5"> Variants 1, 2, and 3, lexicalize have, be, and do, respectively. The third variant slightly outperformed the other variants, including variant 4, which combines all three.</Paragraph> <Paragraph position="6"> The last two methods are drawn directly from Klein and Manning (2003). In CC lexicalization (LEX_CC), both but and & are lexicalized since these two conjunctions are distributed very differently compared to other conjunctions. Though helpful in parsing, it proved detrimental in our system. In % lexicalization (LEX_%), the percent sign (%) is given its own PCT tag rather than its typical NN tag, which gave a statistically signi cant BLEU increase.</Paragraph> <Paragraph position="7"> In addition to propagating up a terminal word, we can also propagate up a nonterminal, which we call tag annotation. This partitions a grammatical category into more speci c subcategories, but not as ne-grained as lexicalization. For example, a VP headed by a VBG can be tag-annotated as VP_VBG to represent a progressive verb phrase.</Paragraph> <Paragraph position="8"> Let us once again return to Figure 2 to address the auxiliary/verb tense disagreement error (*has demonstrate). The auxiliary has expects a VP-C, permitting the bare verb phrase demonstrate to be incorrectly used. However, if we tag-annotate all VP-Cs, rule 6(c) would be relabeled as VP-C_VB in rule 6prime(c) and rule 7(c)as 7prime(c)in Figure 5. Rule 6prime(c)can no longer join with 7prime(c), while the variant rule 6primeprime(c) can, which produces the grammatical result has demonstrated.</Paragraph> <Paragraph position="9"> We noticed many wrong verb tense choices, e.g., gerunds and participles used as main sentence verbs.</Paragraph> <Paragraph position="10"> We resolved this by tag-annotating every VP and VP-C with its head verb (TAG_VP). Note that we group VBZ and VBP together since they have very similar grammatical distributions and differ only by number.</Paragraph> <Paragraph position="11"> This strategy gave a healthy BLEU improvement.</Paragraph> </Section> <Section position="2" start_page="243" end_page="245" type="sub_section"> <SectionTitle> 3.2 External Annotation </SectionTitle> <Paragraph position="0"> In addition to passing information from inside a node to the outside, we can pass information from the external environment into the node through external annotation. This allows us to make translation decisions based on the context in which a word or phrase is found. In this subsection, we look at three such methods: sisterhood annotation, parent annotation, and complement annotation.</Paragraph> <Paragraph position="1"> The single most effective relabeling scheme we tried was sisterhood annotation. We annotate each nonterminal with #L if it has any sisters to the left, #R if any to the right, #LR if on both sides, and nothing if it has no sisters. This distinguishes between words that tend to fall on the left or right border of a constituent (often head words, like NN#L in an NP or IN#R in a PP), in the middle of a constituent (often modi ers, like JJ#LR in an NP), or by themselves (often particles and pronouns, like RP and PRP). In our outputs, we frequently nd words used in positions where they should be disallowed or disfavored. Figure 6 presents a derivation that leads to the ungrammatical output *deeply love she. The sub-ject pronoun she is incorrectly preferred over the object form her because the most popular NP-C translation for a0 a0 a0 is she. We can sidestep this mistake through sisterhood-annotation, which yields the re-labeled rules 3prime(c) and 4prime(c) in Figure 7. Rule 4prime(c) expects an NP-C on the right border of the constituent (NP-C#L). Since she never occurs in this position in the PTB, it should never be sisterhood-annotated as an NP-C#L. It does occur with sisters to the right, which gives the NP-C#R rule 3prime(c). The object NP-C her, on the other hand, is frequently rightmost in a constituent, which is re ected in the NP-C#L rule 3primeprime(c). Using this rule with rule 4prime(c) gives the desired result deeply love her.</Paragraph> <Paragraph position="2"> We experimented with four sisterhood annotation (SISTERHOOD) variants of decreasing complexity.</Paragraph> <Paragraph position="3"> The rst was described above, which includes right-most (#L), leftmost (#R), middle (#LR), and alone (no annotation). Variant 2 omitted #LR, variant 3 kept only #LR, and variant 4 only annotated nodes without sisters. Variants 1 and 2 produced the largest gains from relabeling: 1.27 and 0.85 BLEU points, respectively.</Paragraph> <Paragraph position="4"> Another common relabeling method in parsing is parent annotation (Johnson, 1998), in which a node is annotated with its parent's label. Typically, this is done only to nonterminals, but Klein and Manning (2003) found that annotating preterminals as well was highly effective. It seemed likely that such contextual information could also bene t MT.</Paragraph> <Paragraph position="5"> Let us tackle the bad output from Figure 6 with parent annotation. In Figure 8, rule 4(c)is relabeled as rule 4prime(c)and expects an NP-C^VP, i.e., an NP-C with a VP parent. In the PTB, we observe that the NP-C she never has a VP parent, while her does. In fact, the most popular parent for the NP-C her is VP, while the most popular parent for she is S. Rule 3(c)is relabeled as the NP-C^S rule 3prime(c)and her is expressed as the NP-C^VP rule 3primeprime(c). Only rule 3primeprime(c) can partner with rule 4prime(c), which produces the correct output deeply love her.</Paragraph> <Paragraph position="6"> We tested three variants of parent annotation (PARENT): (1) all nonterminals are parentannotated, (2) only S nodes are parent-annotated, and (3) all nonterminals are parent- and grandparentannotated (the annotation of a node's parent's parent). The rst and third variants yielded the largest ruleset sizes of all relabeling methods. The second variant was restricted only to S to capture the difference between top-level clauses (S^TOP) and em- null bedded clauses (like S^S-C). Unfortunately, all three variants turned out to be harmful in terms of BLEU.</Paragraph> <Paragraph position="7"> In addition to a node's parent, we can also annotate a node's complement. This captures the fact that words have a preference of taking certain complements over others. For instance, 96% of cases where the IN of takes one complement in the PTB, it takes NP-C. On the other hand, although never takes NP-C but takes S-C 99% of the time.</Paragraph> <Paragraph position="8"> Consider the derivation in Figure 9 that results in the bad output *postponed out May 6. The IN out is incorrectly allowed despite the fact that it almost never takes an NP-C complement (0.6% of cases in the PTB). A way to restrict this is to annotate the IN's complement. Complement-annotated versions of rules 2(c) and 3(c) are given in Figure 10. Rule 2(c) is relabeled as the IN/PP-C rule 2prime(c) since PP-C is the most common complement for out (99% of the time). Since rule 3primeprime(c) expects an IN/NP-C, rule 2prime(c) is disquali ed. The preposition from (rule 2primeprime(c)), on the other hand, frequently takes NP-C as complement (82% of the time). Combining rule 2primeprime(c) with rule 3prime(c) ensures the correct output postponed from May 6.</Paragraph> <Paragraph position="9"> Complement-annotating all IN tags with their complement if they had one and only one complement (COMP_IN) gave a signi cant BLEU improvement with only a modest increase in ruleset size.</Paragraph> </Section> <Section position="3" start_page="245" end_page="245" type="sub_section"> <SectionTitle> 3.3 Removal of Parser Annotations </SectionTitle> <Paragraph position="0"> Many parsers, though trained on the PTB, do not preserve the original tagset. They may omit function tags (like -TMP), indices, and null/gap elements or add annotations to increase parsing accuracy and provide useful grammatical information. It is not obvious whether these modi cations are helpful for MT, so we explore the effects of removing them.</Paragraph> <Paragraph position="1"> The statistical parser we used makes three relabelings: (1) base NPs are relabeled as NPB, (2) argument nonterminals are suf xed with -C, and (3) subjectless sentences are relabeled from S to SG. We tried removing each annotation individually (REM_NPB, REM_-C, and REM_SG), but doing so signi cantly dropped the BLEU score. This leads us to conclude these parser additions are helpful in MT.</Paragraph> </Section> </Section> class="xml-element"></Paper>