File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-3004_metho.xml

Size: 10,658 bytes

Last Modified: 2025-10-06 14:10:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-3004">
  <Title>Efficient Algorithms for Richer Formalisms: Parsing and Machine Translation</Title>
  <Section position="2" start_page="0" end_page="223" type="metho">
    <SectionTitle>
1 k-best Parsing and Hypergraphs
</SectionTitle>
    <Paragraph position="0"> NLP systems are often cascades of several modules, e.g., part-of-speech tagging, then syntactic parsing, and finally semantic interpretation. It is often the case that the 1-best output from one module is not always optimal for the next module. So one might want to postpone some disambiguation by propagating k-best lists (instead of 1-best solutions) to subsequent phases, as in joint parsing and semantic role-labeling (Gildea and Jurafsky, 2002). This is also true for reranking and discriminative training, where the k-best list of candidates serves as an approximation of the full set (Collins, 2000; Och, 2003; McDonald et al., 2005). In this way we can optimize some complicated objective function on the k-best set, rather than on the full search space which is usually exponentially large.</Paragraph>
    <Paragraph position="1"> Previous algorithms for k-best parsing (Collins, 2000; Charniak and Johnson, 2005) are either sub-optimal or slow and rely significantly on pruning techniques to make them tractable. So I codeveloped several fast and exact algorithms for k-best parsing in the general framework of directed monotonic hypergraphs (Huang and Chiang, 2005).</Paragraph>
    <Paragraph position="2"> This formulation extends and refines Klein and Manning's work (2001) by introducing monotonic  weight functions, which is closely related to the optimal subproblem property in dynamic programming.</Paragraph>
    <Paragraph position="3"> We first generalize the classical 1-best Viterbi algorithm to hypergraphs, and then present four k-best algorithms, each improving its predessor by delaying more work until necessary. The final one, Algorithm 3, starts with a normal 1-best search for each vertex (or item, as in deductive frameworks), and then works backwards from the target vertex (final item) for its 2nd, 3rd, . . ., kth best derivations, calling itself recursively only on demand, being the laziest of the four algorithms. When tested on top of two state-of-the-art systems, the Collins/Bikel parser (Bikel, 2004) and Chiang's CKY-based Hiero decoder (Chiang, 2005), this algorithm is shown to have very little overhead even for quite large k (say, 106) (See Fig. 1 for experiments on Bikel parser).</Paragraph>
    <Paragraph position="4"> These algorithms have been re-implemented by other researchers in the field, including Eugene Charniak for his n-best parser, Ryan McDonald for his dependency parser (McDonald et al., 2005), Microsoft Research NLP group (Simon Corston-Oliver and Kevin Duh, p.c.) for a similar model, Jonathan Graehl for the ISI syntax-based MT decoder, David A. Smith for the Dyna language (Eisner et al., 2005),  and Jonathan May for ISI's tree automata package Tiburon. All of these experiments confirmed the findings in our work.</Paragraph>
  </Section>
  <Section position="3" start_page="223" end_page="223" type="metho">
    <SectionTitle>
2 Synchronous Binarization for MT
</SectionTitle>
    <Paragraph position="0"> Machine Translation has made very good progress in recent times, especially, the so-called &amp;quot;phrasebased&amp;quot; statistical systems (Och and Ney, 2004). In order to take a substantial next-step it will be necessary to incorporate several aspects of syntax. Many researchers have explored syntax-based methods, for instance, Wu (1996) and Chiang (2005) both uses binary-branching synchronous context-free grammars (SCFGs). However, to be more expressive and flexible, it is often easier to start with a general SCFG or tree-transducer (Galley et al., 2004).</Paragraph>
    <Paragraph position="1"> In this case, binarization of the input grammar is required for the use of the CKY algorithm (in order to get cubic-time complexity), just as we convert a CFG into the Chomsky Normal Form (CNF) for monolingual parsing. For synchronous grammars, however, different binarization schemes may result in very different-looking chart items that greatly affect decoding efficiency. For example, consider the following SCFG rule:</Paragraph>
    <Paragraph position="3"> The intermediate symbols (e.g. VPP-VP) are called virtual nonterminals. We would certainly prefer the right-to-left binarization because the virtual nonterminal has consecutive span (see Fig. 2). The left-to-right binarization causes discontinuities on the target side, which results in an exponential time complexity when decoding with an integrated n-gram model.</Paragraph>
    <Paragraph position="4"> We develop this intuition into a technique called synchronous binarization (Zhang et al., 2006) which binarizes a synchronous production or treetranduction rule on both source and target sides simultaneously. It essentially converts an SCFG into an equivalent ITG (the synchronous extension of CNF) if possible. We reduce this problem to the binarization of the permutation of nonterminal symbols between the source and target sides of a synchronous rule and devise a linear-time algorithm  in terms of translation quality (BLEU score).</Paragraph>
    <Paragraph position="5"> for it. Experiments show that the resulting rule set significantly improves the speed and accuracy over monolingual binarization (see Table 1) in a state-of-the-art syntax-based machine translation system (Galley et al., 2004). We also propose another trick (hook) for further speeding up the decoding with integrated n-gram models (Huang et al., 2005).</Paragraph>
  </Section>
  <Section position="4" start_page="223" end_page="225" type="metho">
    <SectionTitle>
3 Syntax-Directed Translation
</SectionTitle>
    <Paragraph position="0"> Syntax-directed translation was originally proposed for compiling programming languages (Irons, 1961; Lewis and Stearns, 1968), where the source program is parsed into a syntax-tree that guides the generation of the object code. These translations have been formalized as a synchronous context-free grammar (SCFG) that generates two languages simultaneously (Aho and Ullman, 1972), and equivalently, as a top-down tree-to-string transducer (G'ecseg and Steinby, 1984). We adapt this syntax-directed transduction process to statistical MT by applying stochastic operations at each node of the source-language parse-tree and searching for the best derivation (a sequence of translation steps) that converts the whole tree into some target-language string (Huang et al., 2006).</Paragraph>
    <Section position="1" start_page="223" end_page="224" type="sub_section">
      <SectionTitle>
3.1 Extended Domain of Locality
</SectionTitle>
      <Paragraph position="0"> From a modeling perspective, however, the structural divergence across languages results in non-isomorphic parse-trees that are not captured by  SCFGs. For example, the S(VO) structure in English is translated into a VSO order in Arabic, an instance of complex re-ordering (Fig. 4).</Paragraph>
      <Paragraph position="1"> To alleviate this problem, grammars with richer expressive power have been proposed which can grab larger fragments of the tree. Following Galley et al. (2004), we use an extended tree-to-string transducer (xRs) with multi-level left-hand-side (LHS) trees.1 Since the right-hand-side (RHS) string can be viewed as a flat one-level tree with the same non-terminal root from LHS (Fig. 4), this framework is closely related to STSGs in having extended domain of locality on the source-side except for remaining a CFG on the target-side. These rules can be learned from a parallel corpus using English parsetrees, Chinese strings, and word alignment (Galley et al., 2004).</Paragraph>
    </Section>
    <Section position="2" start_page="224" end_page="225" type="sub_section">
      <SectionTitle>
3.2 A Running Example
</SectionTitle>
      <Paragraph position="0"> Consider the following English sentence and its Chinese translation (note the reordering in the passive construction): (2) the gunman was killed by the police .</Paragraph>
      <Paragraph position="1">  Figure 3 shows how the translator works. The English sentence (a) is first parsed into the tree in (b), which is then recursively converted into the Chinese string in (e) through five steps. First, at the root node, we apply the rule r1 which preserves the top-level word-order and translates the English period into its Chinese counterpart:</Paragraph>
      <Paragraph position="3"> Then, the rule r2 grabs the whole sub-tree for &amp;quot;the gunman&amp;quot; and translates it as a phrase: (r2) NP-C ( DT (the) NN (gunman) ) - qiangshou Now we get a &amp;quot;partial Chinese, partial English&amp;quot; sentence &amp;quot;qiangshou VP *&amp;quot; as shown in Fig. 3 (c). Our recursion goes on to translate the VP sub-tree. Here we use the rule r3 for the passive construction: 1we will use LHS and source-side interchangeably (so are RHS and target-side). In accordance with our experiments, we also use English and Chinese as the source and target languages, opposite to the Foreign-to-English convention of Brown et al.</Paragraph>
      <Paragraph position="4">  which captures the fact that the agent (NP-C, &amp;quot;the police&amp;quot;) and the verb (VBN, &amp;quot;killed&amp;quot;) are always inverted between English and Chinese in a passive voice. Finally, we apply rules r4 and r5 which perform phrasal translations for the two remaining sub-trees in (d), respectively, and get the completed Chinese string in (e).</Paragraph>
    </Section>
    <Section position="3" start_page="225" end_page="225" type="sub_section">
      <SectionTitle>
3.3 Translation Algorithm
</SectionTitle>
      <Paragraph position="0"> Given a fixed parse-tree t[?], the search for the best derivation (as a sequence of conversion steps) can be done by a simple top-down traversal (or depth-first search) from the root of the tree. With memoizationm, we get a dynamic programming algorithm that is guaranteed to run in O(n) time where n is the length of the input string, since the size of the parse-tree is proportional to n. Similar algorithms have also been proposed for dependency-based translation (Lin, 2004; Ding and Palmer, 2005).</Paragraph>
      <Paragraph position="1"> I am currently performing large-scale experiments on English-to-Chinese translation using the xRs rules. We are not doing the usual direction of Chinese-to-English partly due to the lack of a sufficiently good Chinese parser. Initial results show promising translation quality (in terms of BLEU scores) and fast translation speed.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML