File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2115_metho.xml

Size: 20,162 bytes

Last Modified: 2025-10-06 14:13:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2115">
  <Title>A Similarity-Driven Transfer System</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Translation Rules
</SectionTitle>
    <Paragraph position="0"> A basic type ,ff gra.ph used in this paper is a labeled directed graph, or art Ida. 2 At, ldg G consists of a set of nodes N, and a set of arcs A. Further, each node and art: has a label, ht particular, node labels are unique. Each node consists tff features, each of which is a pair of a feature name attd a feature v~lue.</Paragraph>
    <Paragraph position="1"> If an ldg lta.~ only one root node, then it is called ~n rldg, and if an Ida has no cyclic pr~th, then it is called an idag. s Therefore~ an ridag denotes an Ida that h~-s only one root node and no cyclic path.</Paragraph>
    <Paragraph position="2"> A translation rnle 4 r consists of the folk,wing three corrtpo,leots:</Paragraph>
    <Paragraph position="4"> where Gm is a matching gr~rph, G~ is a construction graph, e.nd M is a set of mappings between Gm and A matching graph G',,, and a construction graph G~ must be at lea.st an rldag. 5 Further, nodes in (~,, must be labeled uniqnely; that is, each node in G,,, mnst hz~ve only one unique label, and the l~bel of the node n~ in G~ is determined to be the label of the ~The term qabeled' means that nodes and arcs are labeled, and the term ~directed' means that each arc has a direction. Further, an Ida in this paper refers to a connected graph unless otherwise specified.</Paragraph>
    <Paragraph position="5"> ZThe term dag is often used in the NLP world, and usually denotes a rooted connected labeled (as functional) directed graph. But in this paper, dag denotes a direct,:d acycllc graph that may have multiple toots, is not necessarily a connected graph, and does not necessarily itave labels.</Paragraph>
    <Paragraph position="6"> 4In this paper, the term rule does not mean a procedure, but rather a pattern of translation knowledge.</Paragraph>
    <Paragraph position="7"> bSudl graphs are sufficient to express almost MI lingu~atlc strsct ures.</Paragraph>
    <Paragraph position="8"> Figure 2: Samph. rule for translation between Japanese ~tnd English node nm in G,. such that n:. = M(nm).</Paragraph>
    <Paragraph position="9"> Mat)ping between (:.,~ and G~ designates tile cor-.</Paragraph>
    <Paragraph position="10"> respondences be,wee. ,,\[}des in G,. and (;.. l'})r instance, in Figure 2, tim Japanese word &amp;quot;nagai&amp;quot; (&amp;quot;tong&amp;quot;) should c.rrespond to both of the English words &amp;quot;have&amp;quot; and ~\[(lll~111 bl!cal,se if am),her word g.ow~rn.~ the word &amp;quot;nagai&amp;quot; then its English ,re,relation should be connected to the word &amp;quot;h~Lve.&amp;quot; On the other hand, if the Japanese word &amp;quot;to,elan&amp;quot; (&amp;quot;very&amp;quot;) modifies &amp;quot;nagai&amp;quot; then its English translation &amp;quot;very&amp;quot; should be connected to &amp;quot;long.&amp;quot; This shows tllat fi)r node in ~ source languag% two kinds of connection point, for translations of both governing structures attd governed structures of the node, are needed in its translated structure. This implies that there shouht be two kinds of correspondence between G',, and (7~, namely, (I) a mapping from a G,, node n,, to a G~ node nc that is to be a node connected to translations ACq'ES DE COLING-92, NANqT!S, 23-28 AOUq&amp;quot; 1992 7 7 1 PP.OC. OF COLING-92, NAtWrES, AUG. 23-28, 1992 of structures governing nm, and (2) a mapping from n,, to a G~ node n'~ that is to be a node connected to translations of structures governed by n,~. We call the former an upward mapping and the latter a downward mapping, and denote these twn kinds of mapping as follows: where M T is upward mapping, and M ~ is downward mapping.</Paragraph>
    <Paragraph position="11"> Not all kinds of mapping should be permitted as M \[ and M 1. A translation rule r=( Gm,M,Gc ) must satisfy the following conditions:  (1)M T and M I are both injections, (2) there are no two distinct nodes x aml y in G.~ such that M(x)=M(y), e and (3) M l(root(G,,,)) .... t(a~).</Paragraph>
    <Paragraph position="12">  Condition (1) ensures that there is only one c()nnection point in G~ for each translation of gow~rn ing structures and governed structures, coudition (2) ensures that the label of a G'~ node is determined uniquely, and condition (3) ensures that the result of this transfer model becomes a rooted graph (see \[15\] for details). A rule sat.isying these conditions is said to be sound.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Similarity Calculation
</SectionTitle>
    <Paragraph position="0"> This section desribes how a similarity is calcuhm~d.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Graph Distance
</SectionTitle>
      <Paragraph position="0"> The shnilarity between a Gm and an input graph Gi,, is defined as the inverse of the graph distance 7 between thenL First, the simple graph distance D; between Gi,, and G~ is given ;ks follows: D',(G~, a..) = o=(n~., R.,) + E,,, min(D'a(VS(Ri ..... ),GS(t~,, .... ))) where R/, and /~ are roots of Gi~ and Gm, respectlvely~ D,, is a node distance, a= is an arc in G,n such that its source node is R.m, and GS(n~ a) denotes a subgraph that is related to an arc a from n.</Paragraph>
      <Paragraph position="1"> Briefly, a simple distance is the sum of the node distance between two roots and the sum of the minimal simple distances between Gin subgraphs and Gm sub-graphs that, far each arc a outgoing from the GmmOt node, are related to the all arcs a from the root nodes. ~This means that either M ~(x) or M l(x) is equal to either M T(Y) or M .~(y) rDistltnces defined in this section are not actual distances in the mathematical sense.</Paragraph>
      <Paragraph position="2"> However, the larger Gm is, the larger this simple distance becomes. Therefore~ when normalized by the number of nodes in G,,,, the graph distance Dg is given as follows: D;(Gin,G,,,) Dg(Gin, am) -- N where N is the number of nodes in G~.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Node Distance
</SectionTitle>
      <Paragraph position="0"> When considering the distance between two words (nodes), we usually think of their semantic distance in a semantic hierarchy. In general, no matter what semantic hierarchy we use, it is inevitable that there will be some sort of distortion. Further, ,as stated be&gt; fi)re, a node consists of several features and may not have a lexica\[ form that is a pointer to a semantic hierarchy. Therefore, a promising approach to calculating distances between nodes is to use both a semantic hierarchy and syntactic features~ that is, to use syntactic features to correct the distortion contained in the semantic hierarchy to some extent.</Paragraph>
      <Paragraph position="1"> The node distance between a Gin node n i and a G,,, node nm is detined ms follows:</Paragraph>
      <Paragraph position="3"> where DI is a feature node distance, D, is a semantic no(h.&amp;quot; distance, N I is the number of features in nm for DI, and 6, is the weight of a semantic distance.</Paragraph>
      <Paragraph position="4"> The semantic distance D, between a Gi,~ word wi,~ and a G,, word wm is given by the following equation.</Paragraph>
      <Paragraph position="5"> In SimTran, Bunrul Goi Hyou \[5\] code (or bghcode s) is used for calculating the smnantlc distance between Japanese words.</Paragraph>
      <Paragraph position="6">  where bgh(w) is the fraction part of the bghcode of w, bghmax is the mammal difference between two bghcode fraction parts, and 6b is a penalty incurred if two words are not identlcM.</Paragraph>
      <Paragraph position="7"> The feature distance l)f between a Gi~ node hi,, and a Gm nmle nm is given ms follows:</Paragraph>
      <Paragraph position="9"> s A bgheode is a fraction of number. Its integer part roughly corresponds to a syntactic c~tegory, and therefore, only its fraction part is used.</Paragraph>
      <Paragraph position="10"> ACRES DE COLING-92, NANTES, 23-28 AOm&amp;quot; 1992 7 7 2 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 .//&amp;quot; Each matching pivot in ~t simibtr i-cover rule set must have M I or M 1, to ensure that tim Gcs of the i cover rllle set pr(lduce a t:ounected graph a~s a result.</Paragraph>
      <Paragraph position="11"> If there atre rules in the given i-cover rule set that do not s~ttlsfy this condition, they are renloved from the set of ruh, camlidates~ and the cover search method is executed until an i cover rule set th~.t satisfies this conditinn is found. Such as, i-cover rule set is called a proper rule set.</Paragraph>
      <Paragraph position="12"> Next, for each projection nf the given i-cover, we nmst make ;t copy of its origin rule~ m&amp;quot; rule instance, be&gt; C;-LUSe one ride IEay make lllort * thgn oue project(tin un (~in '  In the ~bove equatiolb tile consistency checking de pends on a feature.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Rules Combination Transl~r
</SectionTitle>
    <Paragraph position="0"> In this section, I present tile tlow of the transduction process by using RCT formalism.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Rule Selection
</SectionTitle>
      <Paragraph position="0"> A transfer process rnust first find a set of rules whose Gins' matching parts (called projections) totally overlap all input structure, and which is the most similar to the intmt. We call a uuimi of projections a cover, and a cower identical to the input an isomorphic cover (or i-cover). In or(her words, wha'~ we want here is the i-cover th;~t is the most similar to the input. Further, if a G., make ~L llrojection pj on a Gi~, then tile G,a is called the origin graph of the pj. A pivot is a node of (;~,~ that has more than one origin graph, attd a matching pivot is the origin node of a pivot. For instance, in Figure 3~ A and D are pivots.</Paragraph>
      <Paragraph position="1"> There may be some methods for tinding such an i-cover rule set. One method is to pick up a rule whose projection does not have any arc ow~rlapped by cover by other selected rules until there ~tre no uncovered arc% if it is desirable that a rule set should }lave few overlaps as possible. We h;tve Klso developed auotlmr method using dynamic programmiug: which can choose the most similar rule set from cttndidate rule sets. Briefly, it stores the most similar rule set for each combination of arcs of each node from Ice.yes up to the root~ and the most similar rule set stored in the root node is tile one for the input structure (see \[6\] for details),</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Prc-Lexicalization
</SectionTitle>
      <Paragraph position="0"> It may It~qqlen that ~ lexit:al-hIrm of a 6'~ in the given rub! iust~tnce is lint ~t \[uuldldat~! translation word of its correspoudiltg word in the input, because a lexica\] form in a. l,~tci,iug node it, its G,. is not necessarily the same as the input word. hl this (:~e, such a node is lexlcMized by c~L.dida.te tr~tnslation words.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Node Labeling
</SectionTitle>
      <Paragraph position="0"> The label of a (d,,, node becomes tit(.&amp;quot; I~bel of its mateillng nude in (;~,,. Since (;i,, nodes are labeled uniquely, (C/,..odes are idso I~}mled uniquely. On the uther h;md, the label of a (7,: nude n~ becomes the tttbel of a (,',,, node (n,,~) such that ~'z~ = M T(nm) or '\['here nlay 1 h(lWeVl~r I be twn nodes ill (Jc ill ;C/ rule inst~ulce that are mapped by ;t node in (;,~ with M \] ~.nd M ~, respectiwdy. In the succeeding process, (1~ nodes with the same bLbel are merged into one node in order to gener~.te an mltpul structure, lu this phase, tim transferred hdmls of these two nodes shoulcl be dif ferent~ becnuse the two (lodes should not be merged f.r this rule. We must therefore relabel G~nodes of rule it|stances as follows: G~ Node Relabeling: for any label l i,, G~, if l is distrilmted t\[) twt) distinct uoch!s of (;~ by troth M \[ and M ~ fronl a node (,f (;,,,, then a I~bel l iu a G~ tulde, which is mallped only by M \], or is mapped by both M \[ ~tnd M .{ ~tnd has no descendants, is Cil\[tUg{!d to I ' I</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.4 Gluing
</SectionTitle>
      <Paragraph position="0"> Unificatior~ is ~t well-known c(unput~tiuual tool for c(mm.cting gra.phs, and is widely used in natural language l)rocessing. Usually, unitlcation uses two func-AcrEs DE COLING-92. NANTES. 23-28 Ao~'rr 1992 7 7 3 P~oc. oF COLING-92. NANTI~S. AUG. 23-28. 1992  lionel rldags as data and unifies them front the root node down to the leaves. In RCT, however, we want to merge those nodes of two graphs that have the same labels, even if their root nodes are different and they are not functiona L as shown in Figure 4. Unifi: cation, however, cannot proceed in this manner, because it unifies two nodes that occupy the same p+ sition, and always starts from the root node. For instance, in Figure 4, even if unification starts from node B then it fails, since it tries to unify node D of (a) and node C of (b) for arc y.</Paragraph>
      <Paragraph position="1"> In Graph Grammars, this method of connecting two graphs is called gluing \[2\]. The ghfing used in Graph Grammars is not concerned with the content of a node, so it must be extended in order to check the consistency among the nodes to be glued.</Paragraph>
      <Paragraph position="2"> in SiraTi'an, if two features conflict then the feature whose rule is more simi\[ar to the input is taken.</Paragraph>
      <Paragraph position="3"> Briefly, gluing is performed as followsg: ICivst, nodes with the same label are me~yed if they are consistent.</Paragraph>
      <Paragraph position="4"> If arty nodes fail to be merged , then the ghdn 9 also fails. If all the me~ges succeed, all ares are reatlached to the original nodes, which may or may not be me~yed. As a result, some ares with the same labels and attached to the same nodes may be me~ed, if they are consistent.</Paragraph>
      <Paragraph position="5"> A glued graph is not nece~arily a cmmeeted, rooted, or acyclic graph, but we usually need a connected rldag iu natural language processing. Several constralnts satisfying such requirements are described in previous papers \[14\]\[15\].</Paragraph>
      <Paragraph position="6"> After the G~s have been labeled and relabeled, the target structure is built by gluing the G~s.</Paragraph>
      <Paragraph position="7"> ODetMls of tire algorithm are given iu previous papers \[141115\].</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.5 Post-Lexicalization
</SectionTitle>
      <Paragraph position="0"> The constructed target structure is still bnperfect; there might be a G~ node thai. has no lexical-form, because there are some rules made froul transfer knowledge that have no lexlcal-forms. Therefore, as in the pre-lexicalizatiou phase, non-lexical G: nodes are lex: icalized.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Examples
</SectionTitle>
    <Paragraph position="0"> This sectimt gives examples of translation by SimTcan. Figure 5 shows how the Japanese sentmme &amp;quot;Kauojo no me ga totemo kireina no wo sitteiru&amp;quot; is translated blto the English sentence &amp;quot;(1) know that she has very beautiful eyes.&amp;quot; In this figure, (a) is an input sentence structure, (b),(c), and (d) are rules (precisely, rule instances), and (e) is the output structure produced. In these rules, a mapping line not marked M ~ and M ~ has both M ~ and M ~. Dotted lines designate matching or gluing correspondences between rule nodes and input or output nodes, respectively. I:'urther, numbers prefixed by '*' denote node labels. In this example, we assume type hierarchies in which, for instance, 'yougen(predicate)' is a super-category of 'keiynu(axlj)', and &amp;quot;kaut6o(she)&amp;quot; is an instance of :hnmau'. Note that the node labels of both &amp;quot;have&amp;quot; in rule instance (c) and lower 'pred' in rule instance (b) are changed from that of the corresponding Japanese word &amp;quot;kirei(beautiful)&amp;quot; by the GC/ node relabeling procedure.</Paragraph>
    <Paragraph position="1"> Another example is shown in Figure 6, which shows how the Japanese sentence &amp;quot;US ga ... wo fusegu tame ni buhit/ul kanzei wo kakeru&amp;quot; is translated into the English sentence &amp;quot;US imposes tax on parts in order to blockade .... &amp;quot; In this example, (a) is an input structure, (b), (c) and (d) are matched rules, and (e) is the output structure produced. The Japanese verb &amp;quot;kakeru&amp;quot; has several trauslation candidates as sociated with different governing words, as shown in the following +~able: Similarity dapaues+Eng/ish 5.988 (meishi) ni zeikiu wo kakeru impose tax on (noun) 3,077 (meishi) wo salban ni kakeru take (noun) to court 2.717 (meishi) wo mado ni kakeru hang (noun) in window 2.545 (meishi) wo sutoobu ui kakeru put (noun) on stove haukati ui kousui wo kakeru 2.040 spray perfume on handkerchief This table lists the top live similar rules for the part &amp;quot;buhin ni kanzei wo kakeru&amp;quot; of the input. As shown ACTI~ DE COLING-92. NANTES, 23-28 AOt~q&amp;quot; 1992 7 7 4 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992  A~l.:s BE COLING-92, Nnlqi~;~;, 23-28 ^Ot~T 1992 7 7 5 PROC. OF COLING-92, NANTES, AUG, 23-28, 1992 in this table, rule (c) is the most similar one. Note that this similarity calculation was done for all rules, including non-lexical translation rules. There were no appropriate example rules for the part &amp;quot;US ga kakeru,&amp;quot; and a non-lexical rule (b) was timrefore selected. Further, note that the lexical forms in *3 nodes of (c) and (el are different, and that *4 node of (el has no lexical form other than a preposition, whereas &amp;quot;4 node of (el has a lexical form. The formet was obtained by pre-lexicalization, and the latter by post-lexicaiizatiml.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Related Work
</SectionTitle>
    <Paragraph position="0"> Although there were several early experimental projects on CBMT \[4\]\[9\]\[11\], MWF-H \[10\] is the first working prototype of a case-based transfer systern~ and demonstrates the promise of the CBMT alrproadL It uses Japanese-to-English translation exanlples as translation rules: chooses the source trees of examples that are most similar to the iuput tree from the root node down to the leaves, and assembles those target trees to produce an output tree, With respect to the transducing mechanism, MBT-II is a tree-to-tree transducer adopting one--to-one correspondeuce. MT by LTAGs \[1\], although it is not an attempt of CI3MT, proposed a similar mechanism to RCT described in this paper. It uses paired derivation trees of English and French as translation rules. An input sentence is parsed by the source grammar, and at the same time, its output tree is generated by derivation pairs of trees used in the parsing. As a trausdueer~ this mechanism is also a tree-to-tree transducer adopting one-to-one correspondence.</Paragraph>
    <Paragraph position="1"> In contrast, the RCT employed in SimTran is a rldagto-rldag transducer adopting upward and downward correspondences. These extended correspondences are desirable for expressing the structural discrepancies that often occur in translation. Moreover, this transducing model is a parallel production system \[2\] that Call produce an output structure in one execution of gluing if all the G~s required to produce an output are supplied,</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML