File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1078_metho.xml

Size: 12,169 bytes

Last Modified: 2025-10-06 14:07:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1078">
  <Title>Chart-Based Transfer Rule Application in Machine Translation</Title>
  <Section position="3" start_page="0" end_page="537" type="metho">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> The MT literature deserib(;s several techniques tbr deriving the appropriate translation. Statistical systems l;hal; do not incorporate linguistic analysis (Brown el: al., 1993) typically choose the most likely translation based on a statistical mode.l, i.e.., translation probability determines the translation. (Hein, 1996) reports a set; of (hand-coded) fea|;llre structure based prefi~rence rules to choose among alternatives in Mu\]tra. There is some discussion about adding some transtbr rules automatically acquired flom corpora to Multra? Assuming that they over-generate rules (as we did), a system like the one we propose should 1)e beneficial. In (Way et al., 1997), many ditDrent criteria are used to dloose trmlsi~;r rules to execute including: pretbrmlces for specific rules over general ones, and comt)lex rule nol, ation that insures that tb.w rules can 21)ply to the same set, of words.</Paragraph>
    <Paragraph position="1"> The Pangloss Mark III system (Nirenburg ~This translatioll procedm'e would probably complemenI~ not; replace exist, ing procedures in these systelns.  and Frederking, 1995) uses a chart-walk algorithm to combine the results of three MT engines: an example-based engine, a knowledge-based engine, and a lexical-transfer engine. Each engine contributes its best edges and tile chart-walk algorithm uses dynamic programruing to find the combination of edges with the best overall score that covers the input string.</Paragraph>
    <Paragraph position="2"> Scores of edges are normalized so that the scores fi'om the different engines are comparable and weighted to favor engines which tend to produce better results. Pangloss's algorithm combines whole MT systems. In contrast, our algorithm combines output of individual transfer rules within a single MT system. Also, we use a best-first search that incorporates a probabilistic-based figure of merit, whereas Pangloss uses an empirically based weighting scheme and what appears to be a top-down search.</Paragraph>
    <Paragraph position="3"> Best-first probabilistic chart parsers (Bobrow, 1990; Chitrao and Grishman, 1990; Caraballo and Charniak, 1997; Charniak et al., 1998) strive to find the best parse, without exhaustively trying all possible productions. A probabilistic figure of merit (Caraballo and Charniak, 1997; Charniak et al., 1998) is devised for ranking edges. The highest ranking edges are pursued first and the parser halts after it produces a complete parse. We propose an algorithm for choosing and applying transthr rules based on probability. Each final translation is derived from a specific set of transfer rules. If the procedure immediately selected these transfer rules and applied them in tile correct order, we would arrive at tile final translation while creating the minimum number of edges. Our procedure uses about 4 tinms this minimum number of edges.</Paragraph>
    <Paragraph position="4"> With respect to chart parsing, (Charniak et al., 1998) report that their parser can achieve good results while producing about three times tile mininmm number of edges required to produce the final parse.</Paragraph>
  </Section>
  <Section position="4" start_page="537" end_page="538" type="metho">
    <SectionTitle>
3 Test Data
</SectionTitle>
    <Paragraph position="0"> We conducted two experiments. For experiment1, we parsed a sentence-aligned pair of Spanish and English corpora, each containing 1155 sentences of Microsoft Excel Help Text. These pairs of parsed sentences were divided into distinct training and test sets, ninety percent for training and ten percent fbr test. The training</Paragraph>
    <Section position="1" start_page="537" end_page="538" type="sub_section">
      <SectionTitle>
Parse 2Y=ees
</SectionTitle>
      <Paragraph position="0"> set was used to acquire transfer rules (Meyers et al., 1998b) which were then used to translate tile sentences in tile test set. This paper focuses on our technique for applying these transfer rules in order to translate the test sentences.</Paragraph>
      <Paragraph position="1"> The test and training sets in experiment1 were rotated, assigning a different tenth of the sentences to the test set in each rotation. In this ww we tested tile program on the entire corpus.</Paragraph>
      <Paragraph position="2"> Only one test set (one tenth of the corpus) was used for tuning the system (luring development.</Paragraph>
      <Paragraph position="3"> ~:ansfer rules, 11.09 on average, were acquired t'rom each training set and used for translation of the corresponding test set. For Experiment 2, we parsed 2617 pairs of aligned sentences and used the same rotation procedure for dividing test and training corpora. The Experiment 2 corpus included the experinlentl corpus. An average of 2191 transfer rules were acquired from a given set of Experinmnt 2 training sentences.</Paragraph>
      <Paragraph position="4"> Experimentl is orchestrated in a carefld manner that may not be practical for extremely large corpora, and Experiment 2 shows how the program performs if we scale up and elilniuate some of the fine-tuning. Apart from corpus size, there are two main difference between the two experiments: (1) the experimentl corpus was aligned completely by hand, whereas the Experiment 2 corpus was aligned automatically using the system described ill (Meyers et al., 1998a); and (2) the parsers were tuned to the experimentl sentences, but not the Experiment 2 sentences (that did not overlap with experinmntl).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="538" end_page="538" type="metho">
    <SectionTitle>
4 Parses and Transfer Rules
</SectionTitle>
    <Paragraph position="0"> Figure 1 is a pair of &amp;quot;regularized&amp;quot; parses t br a corresi)onding pair of Spanish and Fmglish sentences fi'om Microsoft Excel hell) text. These at'(; F-structure-like dependency analyses of sentences that represent 1)redicate argument structure. This representation serves to neutralize some ditfbrences between related sentence tyt)es, e.g., the regularized parse of related active and t)a,~sive senten(:es are identical, except tbr the {i'.ature value pair {Mood, Passive}. Nodes (wflues) are labeled with head words and arcs (features) are labeled with gramma~;ical thnetions (subject, object), 1)repositions (in) and subordinate conjunctions (beNre). a. For demonstration purposes, the source tree in Figure 1 is the input to our translation system and the target tree is the outl)ut.</Paragraph>
    <Paragraph position="1"> The t;ransfer rules in Figure 2 can be used to convert the intmt; tree into the out1)at tree. These transtbr rules are pairs of corresponding rooted substructures, where a substructure (Matsumoto et al., 1993) is a connected set of arcs and nodes. A rule aMorphologieal features and their values (Gram-Number: plural) are also represented as ares and nodes. consists of o, ither a pair of &amp;quot;open&amp;quot; substructures (rule 4) or a pair of &amp;quot;closed&amp;quot; substructures (rules 1, 2 and 3). Closed substructures consist of single nodes (A,A',B,B',C') or subtrees (the left hand side of rule 3). Open substructures contain one or more open arcs, arcs without heads (both sul)structures in rule 4).</Paragraph>
  </Section>
  <Section position="6" start_page="538" end_page="539" type="metho">
    <SectionTitle>
5 Simplified Translation with
Tree-based Transfer Rules
</SectionTitle>
    <Paragraph position="0"> The rules in Figure 2 could combine by filling in the open arcs in rule 4 with the roots of the substructures in rules 1, 2 and 3. The result would be a closed edge which maps the left; tree in l,'igure, 1 into the right tree. Just as edges of a chart parser are based on the context free rules used by the chart parser, edges of our translation system are, based on these trans~L'r rules. Initial edges are identical to transtb, r rules. Other edges result from combining one closed edge with one open edge. Figure 3 lists the sequence of edges which wouhl result from combining the initial edges based (m Rules 1-4 to replicate, the trees in Figure 1. The translation proceeds by incrementally matching the left hand sides of Rules 1-4 with the intmt tree (and insuring that the tree is completely covered by these rules).</Paragraph>
    <Paragraph position="1"> The right-hand sides of these comt)atil)le rules are also (:ombined t;o 1)reduce the translal;iolL This is an idealized view of our system in which each node in the input tree matches the left;hand side of exactly one transfer rule: there is no ambiguity and no combinatorial explosion.</Paragraph>
    <Paragraph position="2"> The reality is that more than one transfer rules may be activated tbr each node, as suggested in Figure 4. 4 If each of the six nodes of the source tree corresponded to five transfer rules, there are 56 = 15625 possible combinations of rules to consider. To produce tlm output in Figure 3, a minimum of seven edges would be required: four initial edges derived ti'om the original transfer rules plus three additional edges representing the combination of edges (steps 2,</Paragraph>
  </Section>
  <Section position="7" start_page="539" end_page="540" type="metho">
    <SectionTitle>
6 Best First Translation Procedure
</SectionTitle>
    <Paragraph position="0"> The following is an outline of our best first search procedure for finding a single translation:  1. For each node N, find TN, the set of compatible transfer rules 2. Create initial edges for all TN 3. Repeat until a &amp;quot;finished&amp;quot; edge is tbund or an edge limit is reached: (a) Find the highest scoring edge E (b) If complete, combine E with compatible incoml)lete edges (c) If incomplete, combine E with compatible complete edges (d) Incomplete edge + complete edge = new edge  The procedure creates one initial edge for each matching transfer rule in the database 5 and puts these edges in a '~The left-hand side of a matching transfer rule is compatible with a substructure in the input source tree.  structm:e queue prioritized by score. The procedure iteratively combines the best s(:oring edge with some other comt)al;ilfle edge to t)roduce a new edge. and inserts the new edge in the queu('.. The score for each new edge is a function of the scores of the edges used to produce it:. The process contimms m~til either an edge limit is reache(l (the system looks like it; will take too long to terminate) or a complete edge is t)roduced whose left-hand side is the input tree: we (:all this edge a &amp;quot;finished edge&amp;quot;. We use the tbllowing technique for calculating the score tbr initial edges. 6 The score tbr each initial edge E rooted at N, based on rule/~, is calculated as follows:</Paragraph>
    <Paragraph position="2"> Where the fl'equency (Freq) of a rule is the nmnber of times it matched an exmnple in the training corpus, during rule ~cquisition.</Paragraph>
    <Paragraph position="3"> The denominator is the combined fl'equencies of all rules that match N.</Paragraph>
    <Paragraph position="4"> aThis is somewhat det)cndent on the way these |;ransfer rules are derived. Other systems would t)robably have to use some other scoring system.</Paragraph>
    <Paragraph position="5">  2, S s ) = s ,o,.(;.l ( S ) - No,.,,,, Where the Norm (normalization) t~ctor is equal to the highest SCORE1 for any rule matching N.</Paragraph>
    <Paragraph position="6"> Since the log.2 of probabilities are necessarily negative, this has the effect of setting the E of each of the most t)rol)able initial edges to zero. The scores tbr non-initial edges are calculated by ad(ling u I) the scores of the initial e(tges of which they are comt)osed. 7 Without any normMization (Score(S) = SCORE1 (,9)), small trees are favored over large trees. This slows down the process of finding the final result. The normalization we use insures that the most probable set; of transihr rules are considered early on.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML