XML Viewer - c00-2092

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2092_metho.xml
Size: 19,320 bytes
Last Modified: 2025-10-06 14:07:08
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2092">
  <Title>Data-Oriented Translation</Title>
  <Section position="4" start_page="0" end_page="637" type="metho">
    <SectionTitle>
2 The Data-Oriented 35ranslation Model
</SectionTitle>
    <Paragraph position="0"> In this section, we will give the instantiation of a model that uses DOP for MT purposes, which we will call Data-Oriented Translation (DOT). t This model is largely based on DOPI (Bod, 1998, chapt.</Paragraph>
    <Paragraph position="1"> 2).</Paragraph>
    <Paragraph position="2"> In DOT, we use linked subtree pairs as combinational flagments, a Each linked subtree pair has a certain probability, and consists of a trec in the source language and a tree in the target language. By combining these fragments to form an an analysis of the soume sentence, we automatically generate a translation, i.e. we form a derivation of both source sentence and target sentence. Since there am typically many different derivations which contain the same source sentence, there can be equally many different translations t\~r it. Tile probability of a translation can be calculated as the total probability of all the derivations that form this translation. Tile model presented here is capable of translating between two hmguages only. This lilnitation is by no means a property of the model itself, but is chosen for simplicity and readability reasons only.</Paragraph>
    <Paragraph position="3"> The following parameters should be specified for a DOP-like approach to MT:  1. tile representations of sentences that are asslimed, null 2. the fragments of these representations that can be used for generating new representations, 3. the operator that is used to combine the flag- null ments to form a translation, and I This is actually the second instantiation of such a framework. The original model (Poutsma, 1998; l)outsnm, 2000) had a major flaw, which resulted in translations that were simply incorrect, as pointed out by Way (1999).</Paragraph>
    <Paragraph position="4">  4. the model that is used for determining the probability of a target sentence given a source sentence.</Paragraph>
    <Paragraph position="5"> In the explanation that follows, we will use a subscript s to denote an element of the source language, and a subscript t to denote one of the target language. null</Paragraph>
    <Section position="1" start_page="635" end_page="635" type="sub_section">
      <SectionTitle>
2.1 Representations
</SectionTitle>
      <Paragraph position="0"> In DOT, we basically use the same utteranceanalysis as in DOPI (i.e. syntactically labeled phrase structure trees). To allow for translation capabilities in tiffs model, we will use pairs of trees that incorporate semantic infonnation. The amounl of semantic information need not be very detailed, since all we are interested in is semantic equivalence. Two trees 7\] and T2 are said to be semantic equivalents (denoted as TI &amp;quot;&amp;quot; 7~) iff TI can be replaced with T2 without loss of meaning.</Paragraph>
      <Paragraph position="1"> We can now introduce the notion of links: a link symbolizes a semantic equivalence between two trees, or part of trees. It can occur at any level in the tree structure, except for the terminal level. 3 The representation used in DOT is a 3-tuple (T,, Tt, C/), where ~ is a tree in the somce language, Tt is a tree in the target language, and C/ is a function that maps between semantic equivalent parts in both trees. In the rest of this article, we will refer to this 3-tuple as tile pair (T,, g).</Paragraph>
      <Paragraph position="2"> Because of the semantic equivalence, a link nmst exist at the top level of the tree pair (Ts, Tt). Figure 1 shows an example of two linked trees, the links are depicted graphically as dashed lines.</Paragraph>
      <Paragraph position="3"> 3Links cannot occur at the terminal level, since we map between semantic equivalent parts on the level of syntactic categories.</Paragraph>
    </Section>
    <Section position="2" start_page="635" end_page="635" type="sub_section">
      <SectionTitle>
2.2 Fragments
</SectionTitle>
      <Paragraph position="0"> Likewise, we will use linked subtrees as our flagments. Given a pair of linked trees (T~, Tt), a linked subtree pair of (T~, Tt) consists of two connected and linked subgraphs (t~, 6) of (77~, 7}) such that:  1. for every pair of linked nodes in (t.,.,6), it holds that: (a) both nodes in (ts,lt} have either zero daughter nodes, or (b) both nodes have all the daughter nodes of the corresponding nodes in (T,, Tt) and 2. every non-linked node in either t~. (or 6) has all the daughter nodes of the corresponding node in T, (T,), and 3. both t, and ~ consist of more than one node.  This definition has a number of consequences. First of all, it is morn restrictive than the DOPI definition for subtrees, thus resulting in a smaller or equal amount of subtrees per tree. Secondly, it defines a possible pair of linked subtl'ees. Typically, there are many pairs of linked subtrees for each set of linked trees. Thirdly, the linked tree pair itself is also a valid linked subtree pair. Finally, according to this definition, all the linked subtree pairs are semantic equivalents, since the semantic daughter nodes of the original tree are removed or retained simultaneously (clause 1). The nodes for which a semantic equivalent does not exist are always retained (clause 2).</Paragraph>
      <Paragraph position="1"> We can now define the bag of linked subtree pailw, which we will use as a grammar. Given a corpus of linked trees C, the bag of linked subtree pairs of C is the bag in which linked subtree pairs occur exactly as often as they can be identified in C. 4 Figure 2 show the bag of linked subtree pairs for the linked tree pair (T,, Tt).</Paragraph>
    </Section>
    <Section position="3" start_page="635" end_page="636" type="sub_section">
      <SectionTitle>
2.3 Composition operator
</SectionTitle>
      <Paragraph position="0"> In DOT, we use the leftmost substitution operator for forming combinations of grammar rules.</Paragraph>
      <Paragraph position="1"> The composition of tile linked tree pair {ts,6) and  (us,u,), written as (ts,tt)o (u.,.,u,), is deiined iff |he label of lhe leftmost nonterlninal \]inked fi'ontier uocle and the label of its linked counterpart are identical to the labels of the root nodes of (u.~., ur). If this composition is defined, it yields a copy of (t,.,tt), in which a copy of u.,. has been substituted on t.,.'s left-most nonterminal linked frontier node, and a copy of ut has been substituted on the node's linked counterpart. The colnposition operation is illustrated in figure 3.</Paragraph>
      <Paragraph position="2"> Given a bag of linked subtree pairs B, a sequence of compositions (ts~ , it, ) o... o {t.~N, bN ), with (t.~i,b~) E B yielding a tree pair (T,,Tt) without non-terminal leaves is called a derivation D of (7~., 7~).</Paragraph>
    </Section>
    <Section position="4" start_page="636" end_page="637" type="sub_section">
      <SectionTitle>
2.4 Probability calculation
</SectionTitle>
      <Paragraph position="0"> To compute the probability of the target composition, we make the same statistical assumptions as in DOPI with regard to independence and representatiou of the subtrees (Bed, 1998, p. 16).</Paragraph>
      <Paragraph position="1"> The probability of selecting a subtree pair (ts~bl is calculated by dividing the frequency of the sub-tree pair in the bag by the number of snbtrees that have the same root node labels in this bag. In other words, let I(t.,,t,)l be the number of times the sub-tree pair (t,.,tr} occurs in the bag of subtree pairs, and r(t) be the root node categories of t, then the probability assigned to (is,b) is</Paragraph>
      <Paragraph position="3"> Given the assumptions that all subtree pairs are independent, Ihe probability of a derivation (ts~ ,hi) o... o (GN,ttN) is equal to the product of the probabilities of the used subtree pairs.</Paragraph>
      <Paragraph position="5"> (2) The translation generated by a derivation is equal to the sentence yielded by the target trees of the derivation. Typically, a translation can be generated by a large number of different deriwltions, each of which has its own probability. Therefore, the probability of a translation ws ~ wt is the sum of the probabilities of its derivations:</Paragraph>
      <Paragraph position="7"> The justification of this last equation is quite trivial. As in any statistical MT system, we wish to choose the target sentence w~ so as to maximize P(wtlw,) (Brown et al., 1990, p. 79). if we take the sum over all possible derivations that wele formed from Ws and derive wt, we can rewrite this as equation 4, as seen below. Since both ws and wt are contained in Dlw,,w, ), we can remove them both and arrive at equation 5, which--as we maximize over wt--is equivalent to equation 3 above.</Paragraph>
      <Paragraph position="9"/>
    </Section>
  </Section>
  <Section position="5" start_page="637" end_page="637" type="metho">
    <SectionTitle>
3 Computational Aspects
</SectionTitle>
    <Paragraph position="0"> When translating using the DOT model, we can distinguish between three computational stages: I. parsing: the formation of a derivation forest,  2. translation: the transfer of the derivation forest from the source language to the target language, null 3. disambiguation: the selection of the most probable translation from the derivation forest.</Paragraph>
    <Section position="1" start_page="637" end_page="637" type="sub_section">
      <SectionTitle>
3.1 Parsing
</SectionTitle>
      <Paragraph position="0"> In DOT, every subtlee pair (t~,tt) can be seen as a productive rewrite rule: (root(t~),root(tt)) (frontier(ts), frontier(tt)), where all linkage in the frontier nodes is retained. The linked non-terminals in the yield constitute the symbol pairs to which new roles (subtlee pairs) are applied. For instance, the rightmost subtree pair in tigure 3 can be rewritten as</Paragraph>
      <Paragraph position="2"> This rule can then be combined with nfles that have the root pair (NP, NP), and so on.</Paragraph>
      <Paragraph position="3"> If we only consider the left-side part of this rule, we can use algorithms that exist for context-free grammars, so that we can parse a sentence of n words with a time complexity which is polynomial in n. These algorithms give as output a chart-like derivation forest (Sima'an et al., 1994), which contains the tree pairs of all the derivations that can be formed.</Paragraph>
    </Section>
    <Section position="2" start_page="637" end_page="637" type="sub_section">
      <SectionTitle>
3.2 Translation
</SectionTitle>
      <Paragraph position="0"> Since every tree pair in the derivation forest contains a tree for the target language, the translation of this folest is trivial.</Paragraph>
    </Section>
    <Section position="3" start_page="637" end_page="637" type="sub_section">
      <SectionTitle>
3.3 Disambiguation
</SectionTitle>
      <Paragraph position="0"> In order to select the most probable translation, it is not efficient to compare all translations, since there can be exponentially many of them. Furthermore, it has been shown that the Viterbi algorithm cannot be used to make the most probable selection from a DOP-like derivation forest (Sima'an, 1996).</Paragraph>
      <Paragraph position="1"> Instead, we use a random selection lnethod to generate derivations from the target derivation forest, otherwise known as Monte Carlo sampling (Bod, 1998, p. 4649). In this method, the random choices of derivations ale based on the probabilities of the nnderlying subderivations. If we generate a large number of samples, we can estimate the most probable translation as the translation which results most often. The most probable translation can be estimated as accurately as desired by making the number of random samples sufficiently large.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="637" end_page="638" type="metho">
    <SectionTitle>
4 Pilot Experiments
</SectionTitle>
    <Paragraph position="0"> In order to test the DOT-model, we did some pilot experiments with a small part of the Verbmobil corpus. This corpus consists of transliterated spoken appointment dialogues in German, English,  and Japanese. We only used the German and English datasets, which were aligned at sentence level, and syntactically annotated using different annotation schemes. 5 Naturally, the tree pairs in the corpus did not contain any links, so--in order to make it useful for l)OT--we had to analyze each tree pair, and place links where necessary. We also corrected tree pairs that were not aligned correctly. Figure 4 shows an example of a corrected and linked tree from our col rection of the Verbmobil corpus.</Paragraph>
    <Paragraph position="1"> We used a blind testing method, dividing the 266 trees of our corpus into an 85% training set of 226 tree pairs, and a 15% test set of 40 tree pairs. We carried out three experiments, in both directions, each using a different split of training and test set. The 226 training set tree pairs were converted into fragments (i.e. subtree pairs), and were enriched with their corpus probabilities. The 40 sentences from the lest set served as input sentences: they were translated with the fragments from the training set using a bottom-up chart parser, and disambiguated by the Monte Carlo algorithm. The most probable translations were estinmted from probability distributions of 1500 sampled derivations, which accounts for a standard deviation C/5 &lt; 0.013. Finally, we compared the resulting trauslations wilh the original translation as given in the test set. We also fed tile tes! sentences inlo another MT-system: AltaVista's Babelfish, which is based on Systran. 6</Paragraph>
    <Section position="1" start_page="638" end_page="638" type="sub_section">
      <SectionTitle>
4.1 Evahmtion
</SectionTitle>
      <Paragraph position="0"> In a manner similar to (Brown et al., 1990, p. 83), we assigned each of the resulting sentences a category according to the following criteria. If the produced sentence was exactly the stone as the actual Verbmobil translation, we assigned it the exact catego W. If it was a legitimate translation of the source sentence but in different words, we assigned it the alternale category. If it made sense as a sentence, but could not be interpreted as a valid translation of the source sentence, we assigned it the wrong category. If the translation only yielded a part of the source sentence, we assigned it the partial category: either partial exact if it was a part of the actual Verbmobil translation, or partial alternate if it was part of an alternate translation. Finally, if no translation  ich buche die Zfige.</Paragraph>
      <Paragraph position="1"> Ich werdc (tie Ziige reservieren.</Paragraph>
      <Paragraph position="2"> Es ist ja keine Behgrde.</Paragraph>
      <Paragraph position="3"> It is not an administrative office you know.</Paragraph>
      <Paragraph position="4"> There is not an administrative office you know.</Paragraph>
      <Paragraph position="5"> Translated as: And as said 1 think the location of the branch office is posh.</Paragraph>
      <Paragraph position="6"> Und wit gesagt ich denke die Lage zur Filiale spricht Biinde ist.</Paragraph>
      <Paragraph position="7"> ich denke die Lage l'artial Alternate lch habe Preise veto Parkhotel ltannover da.</Paragraph>
      <Paragraph position="8"> Verbmobil: 1 have got prices for Hatmover Parkhotel here.</Paragraph>
      <Paragraph position="9"> Translated as: for Parkhotel Hannover  was given, * &amp;quot; we assigned it tile none category. Tile resuits we obtained from Systran were also evaluated using this procedure. Figure 5 gives some classiIication examples.</Paragraph>
      <Paragraph position="10"> The method of evaluation is very strict: even if ore&amp;quot; model generated a translation that had a better quality than the given Verbmobil translation, we still assigned it the (partial) alternate category. This can be seen in the second example in figure 5.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="638" end_page="639" type="metho">
    <SectionTitle>
4.2 Results
</SectionTitle>
    <Paragraph position="0"> The results that we obtained can be seen in table 1 and 2. In both our experiments, the number of exact translations was somewhat higher tlmn Systrmfs, but Systran excelled at the number of alternate translations. This can be explained by the fact that Systran has a much larger lexicon, thus allowing it to form much more alternate translations.</Paragraph>
    <Paragraph position="1"> While it is meaningless to compare results obtained from different corpora, it may be interesting to note that Brown et al. (1990) report a 5% exact match in experiments with the Hansard corpus, indicating that an exact match is very hard to achieve.</Paragraph>
    <Paragraph position="2"> The number of ungrammatical translations in our  English to German experiment were much higher than Systran's (32% versus Systran's 19%); vice-versa it was much lower (13% versus Systran's 21%). Since the German grammar is more complex than the English grammar, this result could be expected. It is simpler to map a complex grammar to a simpler than vice-versa.</Paragraph>
    <Paragraph position="3"> The partial translations, which are quite useflfl for forming the basis of a post-edited, manual translation, varied around 38% in our English to German experiments, and around 55% when translating from German to English. Systran is incapable of forming partial translations.</Paragraph>
    <Paragraph position="4"> As can be seen from the tables, we experimented with the maxinmm depth of the tree pairs used. We expected that the performance of the model would increase when we used deeper subtree pairs, since deeper structures allow for more complex structures, and therefore better translations. Our experiments showed, however, that there was very little increase of performance as we increased the maximum tree depth. A possible explanation is that the trees in our corpus contained a lot of lexical context (i.e. terminals) at very small tree depths. Instead of varying the maximum tree depth, we should experiment with varying the maximum tree width. We plan to perform such experiments in the future.</Paragraph>
  </Section>
  <Section position="8" start_page="639" end_page="639" type="metho">
    <SectionTitle>
5 Future work
</SectionTitle>
    <Paragraph position="0"> Though the findings presented in this article cover the most important issues regarding DOT, there are still some topics open for future research.</Paragraph>
    <Paragraph position="1"> As we stated in the previous section, we wish to see whether DOT's performance increases as we vary the maximum width of a tree.</Paragraph>
    <Paragraph position="2"> In the experiments it became clear that DOT lacks a large lexicon, thus resulting in less alternate translations than Systran. By using an external lexicon, we can form a part-of-speech sequences fiom the source sentence, and use this sequence as input for DOT. The resulting target part-of-speech sequence can then be reformed into a target sentence.</Paragraph>
    <Paragraph position="3"> The experiments discussed in this article are pilot experiments, and do not account for much. In order to find more about DOT and its (dis)abilities, more experiments on larger corpora are required.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML