XML Viewer - p98-1120

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1120_metho.xml
Size: 19,251 bytes
Last Modified: 2025-10-06 14:14:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1120">
  <Title>SOLVING ANALOGIES ON WORDS: AN ALGORITHM</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SOLVING ANALOGIES ON WORDS: AN ALGORITHM
Yves Lepage
ATR Interpreting Telecommunications Research Labs,
</SectionTitle>
    <Paragraph position="0"> Hikaridai 2-2, Seika-tyS, SSraku-gun, KySto 619-0288, Japan lepage@itl, atr. co. jp Introduction To introduce the algorithm presented in this paper, we take a path that is inverse to the historical development of the idea of analogy (se e (Hoffman 95)). This is necessary, because a certain incomprehension is faced when speaking about linguistic analogy, i.e., it is generally given a broader and more psychological definition. Also, with our proposal being computational, it is impossible to ignore works about analogy in computer science, which has come to mean artificial intelligence.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="729" type="metho">
    <SectionTitle>
1 A Survey of Works on Analogy
</SectionTitle>
    <Paragraph position="0"> This paper is not intended to be an exhaustive study. For a more comprehensive study on the subject, see (Hoffman 95).</Paragraph>
    <Section position="1" start_page="0" end_page="728" type="sub_section">
      <SectionTitle>
1.1 Metaphors, or Implicit Analogies
</SectionTitle>
      <Paragraph position="0"> Beginning with works in psychology and artificial intelligence, (Gentner 83) is a milestone study of a possible modeling of analogies such as, &amp;quot;an atom is like the solar system&amp;quot; adequate for artificial intelligence. In these analogies, two domains are mapped, one onto the other, thus modeling of the domain becomes necessary.</Paragraph>
      <Paragraph position="1"> Y sun-,nucleus planet-~Yelectron In addition, properties (expressed by clauses, formulae, etc.) are transferred from one domain onto the other, and their number somehow determines the quality of the analogy.</Paragraph>
      <Paragraph position="2"> aZZracts(sun, J~aZZracZs(nucleus, planeZ) elecZron) moremassive(sun, -~fmoremassive(nucleus, planet) elecZron) However, Gentner's explicit description of sentences as &amp;quot;an A is like a B&amp;quot; as analogies is subject to criticism. Others (e.g. (Steinhart 94)) prefer to call these sentences metaphors 1, the validity of which rests on sentences of the kind, &amp;quot;A is to B as C is to D&amp;quot;, for which the name analogy 2 is reserved. In other words, some metaphors are supported by analogies. For instance, the metaphor, &amp;quot;an atom is like the solar system&amp;quot;, relies on the analogy, &amp;quot;an electron is to the nucleus, as a planet is to the sun&amp;quot; .3 The answer of the AI community is complex because they have headed directly to more complex problems. For them, in analogies or metaphors (Hall 89): two different domains appear for both domains, modeling of a knowledge-base is necessary mapping of objects and transfer of properties are different operations the quality of analogies has to be evaluated as a function of the strength (number, truth, etc.) of properties transferred.</Paragraph>
      <Paragraph position="3"> We must drastically simplify all this and enunciate a simpler problem (whose resolution may not necessarily be simple). This can be aclfieved by simphfying data types, and consequently the characteristics of the problem. alf the fact that properties are carried over characterises such sentences, then etymologically they are metaphors: In Greek, pherein: to carry; meta-: between, among, with, after. &amp;quot;Metaphor&amp;quot; means to transfer, to carry over.</Paragraph>
      <Paragraph position="4"> 2In Greek, logos, -logio: ratio, proportion, reason, discourse; ann-: top-down, again, anew. &amp;quot;Analog3,&amp;quot; means the same proportions, similar ratios.</Paragraph>
    </Section>
    <Section position="2" start_page="728" end_page="728" type="sub_section">
      <SectionTitle>
1.2 Multiplicity vs Unicity of Domains
</SectionTitle>
      <Paragraph position="0"> In the field of natural language processing, there have been plenty of works on pronunciation of English by analogy, some being very much concerned with reproducing human behavior (see (Damper &amp; Eastmond 96)). Here is an illustration of the task from (Pirelli &amp; Federici 94):</Paragraph>
      <Paragraph position="2"> Similarly to AI approaches, two domains appear (graphemic and phonemic). Consequently, the functions f, g and h are of different types because their domains and ranges are of different data types.</Paragraph>
      <Paragraph position="3"> Similarly to AI again, a common feature in such pronouncing systems is the use of data bases of written and phonetic forms. Regarding his own model, (Yvon 94) comments that: The \[...\] model crucially relies upon the existence of numerous paradigmatic relafionsh.ips in lexical data bases.</Paragraph>
      <Paragraph position="4"> Paradigmatic relationships being relationships in which four words intervene, they are in fact morphological analogies: &amp;quot;reaction is to reactor, as faction is to factor&amp;quot;.</Paragraph>
      <Paragraph position="6"> factor ~ faction Contrasting sharply with AI approaches, morphological analogies apply in only one domain, that of words. As a consequence, the number of relationships between analogical terms decreases from three (f, g and h) to two (f and g). Moreover, because all four terms intervening in the analogy are from the same domain, the domains and ranges of f and g are identical. Finally, morphological analogies can be regarded as simple equations independent of any knowledge about the language in which they are written. This standpoint eliminates the need for any knowledge base or dictionary. null \] reactor --, reaction ~g ~g factor ~ x?</Paragraph>
    </Section>
    <Section position="3" start_page="728" end_page="728" type="sub_section">
      <SectionTitle>
1.3 Unicity vs Multiplicity of Changes
</SectionTitle>
      <Paragraph position="0"> Solving morphological analogies remains difficult because several simultaneous changes may be required to transform one word into a second (for instance, doer ---, undo requires the deletion of the suffix -er and the insertion of the prefix un-). This problem has yet to be solved satisfactorily. For example, in (Yvon 94), only one change at a time is allowed, and multiple changes are captured by successive applications of morphological analogies (cascade model). However, there are cases in the morphology of some languages where multiple changes at the same time are mandatory, for instance in semitic languages.</Paragraph>
      <Paragraph position="1"> &amp;quot;One change at a time&amp;quot;, is also found in (Nagao 84) for a translation method, called translation by analogy, where the translation of an input sentence is an adaptation of translations of similar sentences retrieved from a data base.</Paragraph>
      <Paragraph position="2"> The difficulty of handling multiple changes is remedied by feeding the system with new examples differing by only one word commutation at a time. (Sadler and Vendelmans 90) proposed a different solution with an algebra ontrees: differences on strings are reflected by adding or subtracting trees. Although this seems a more convincing answer, the use of data bases would resume, as would the multiplicity of domains.</Paragraph>
      <Paragraph position="3"> Our goal is a true analogy-solver, i.e., an algorithm which, on receiving three words as input, outputs a word, analogical to the input. For that, we thus have to answer the hard problem of: (1) performing multiple changes (2) using a unique data-type (words) (3) without dictionary nor any external knowledge.</Paragraph>
    </Section>
    <Section position="4" start_page="728" end_page="729" type="sub_section">
      <SectionTitle>
1.4 Analogies on Words
</SectionTitle>
      <Paragraph position="0"> We have finished our review of the problem and ended up with what was the starting point of our work. In linguistic works, analogy is defined by Saussure, after Humboldt and Baudoin de Courtenay, as the operation by which, given two forms of a given word, and only one form of a second word, the missing form is coined 4, &amp;quot;honor is to hon6rem as 6r6tor is to 6rSt6rem&amp;quot; noted 6r~t6rem : 6rdtor = hon6rem : honor.</Paragraph>
      <Paragraph position="1"> This is the same definition as the one given by Aristotle himself, &amp;quot;A is to B as C is to D&amp;quot;, postulating identity of types for A, B, C, and D.</Paragraph>
      <Paragraph position="2"> 4Latin: 6rdtor (orator, speaker) and honor (honour) nominative singular, 5rat6rern and honfrem accusative singular.  However, while analogy has been mentioned and used, algorithmic ways to solve analogies seem to have never been proposed, maybe because the operation, is so &amp;quot;intuitive&amp;quot;. We (Lepage &amp; Ando 96) recently gave a tentative computational explanation which was not always valid because false analogies were captured. It did not constitute an algorithm either.</Paragraph>
      <Paragraph position="3"> The only work on solving analogies on words seems to be Copycat ((Hofstadter et al. 94) and (Hoffman 95)), which solves such puzzles as: abc : abbccc = ijk : x. Unfortunately it does not seem to use a truly dedicated algorithm, rather, following the AI approach, it uses a forlnalisation of the domain with such functions as, &amp;quot;previous in aZphabe'c&amp;quot;, &amp;quot;rank in aZphabel:&amp;quot;, etc.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="729" end_page="730" type="metho">
    <SectionTitle>
2 Foundations of the Algorithm
2.1 The First Term as an Axis
</SectionTitle>
    <Paragraph position="0"> (Itkonen and Haukioja 97) give a program in Prolog to solve analogies in sentences, as a refutation of Chomsky, according to whom analogy would not be operational in syntax, because it dehvers non-gralnmatical sentences. That analogy would apply also to syntax, was advocated decades ago by Hermann Paul and Bloomfield.</Paragraph>
    <Paragraph position="1"> Chomsky's claim is unfair, because it supposes that analogy applies only on the symbol level.</Paragraph>
    <Paragraph position="2"> Itkonen and Haukioja show that analogy, when controlled by some structural level, does deliver perfectly grammatical sentences. What is of interest to us, is the essence of their method, which is the seed for our algorithm: Sentence D is formed by going through sentences B and C one element at a time and inspecting the relations of each element to the structure of sentence A (plus the part of sentence D that is ready).</Paragraph>
    <Paragraph position="3"> Hence, sentence A is the axis against which sentences B and C are compared, and by opposition to which output sentence D is built.</Paragraph>
    <Paragraph position="4"> rextder : u_~nreadoble = d&amp;quot;-oer : x ~ x = un~able The method will thus be: (a) look for those parts which are not common to A and B on one hand, and not common to A and C on the other and (b) put them together in the right order.</Paragraph>
    <Section position="1" start_page="729" end_page="729" type="sub_section">
      <SectionTitle>
2.2 Common Subsequenees
</SectionTitle>
      <Paragraph position="0"> Looking for common subsequences of A and B (resp. A and C) solves problem (a) by complementation. (Wagner &amp; Fischer 74) is a method to find longest common subsequences by computing edit distance matrices, yielding the minimal number of edit operations (insertion, deletion, substitution) necessary to transform one string into another.</Paragraph>
      <Paragraph position="1"> For instance, the following matrices give the distance between like and unlike on one hand, and between like and known on the other hand, in their right bottom cells: dist(like, unlike) = 2  and dist( Iike, known) = 5 u n l i k e k n o w n ! 1 2 2 3 4 5 l 1 2 3 4 5 i 2 2 3 2 3 4 i 2 2 3 4 5 k 3 3 3 3 2 3 k 2 3 3 4 5 e 4 4 4 4 3 2 e 3 3 4 4 5</Paragraph>
    </Section>
    <Section position="2" start_page="729" end_page="730" type="sub_section">
      <SectionTitle>
2.3 Similitude between Words
</SectionTitle>
      <Paragraph position="0"> We call similitude between A and B the length of their longest common subsequence. It is also equal to the length of A, minus the number of its characters deleted or replaced to produce B.</Paragraph>
      <Paragraph position="1"> This number we caU pdist(A,B), because it is a pseudo-distance, which can be computed exactly as the edit distances, except that insertions cost 0.</Paragraph>
      <Paragraph position="3"> aside, precisely because they are those characters of B and C, absent from A, that we want to assemble into the solution, D.</Paragraph>
      <Paragraph position="4"> As A is the axis in the resolution of analogy, graphically we make it the vertical axis around which the computation of pseudo-distances takes place. For instance, for like:unlike =</Paragraph>
    </Section>
    <Section position="3" start_page="730" end_page="730" type="sub_section">
      <SectionTitle>
2.4 The Coverage Constraint
</SectionTitle>
      <Paragraph position="0"> It is easy to verify that there is no solution to an analogy if some characters of A appear neither in B nor in C. The contrapositive says that, for an analogy to hold, any character of A has to appear in either B or C. Hence, the sum of the similitudes of A with B and C must be greater than or equal to its length: sim(A, B) +</Paragraph>
      <Paragraph position="2"> When the length of A is greater than the sum of the pseudo-distances, some subsequences of A are common to all strings in the same order.</Paragraph>
      <Paragraph position="3"> Such subsequences have to be copied into the solution D. We call com(A, B, C, D) the sum of the length of such subsequences. The delicate point is that this sum depends precisely on the solution D being currently built by the algorithnL To summarise, for analogy A : B = C : D to hold, the following constraint must be verified:</Paragraph>
      <Paragraph position="5"/>
    </Section>
  </Section>
  <Section position="4" start_page="730" end_page="731" type="metho">
    <SectionTitle>
3 The Algorithm
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="730" end_page="730" type="sub_section">
      <SectionTitle>
3.1 Computation of Matrices
</SectionTitle>
      <Paragraph position="0"> Our method relies on the computation of two pseudo-distance matrices between the three first terms of the analogy. A result by (Ukkonen 85) says that it is sufficient to compute a diagonal band plus two extra bands on each of its sides in the edit distance matrix, in order to get the exact distance, if the value of the overall distance is known to be less than some given threshold. This result applies to pseudo-distances, and is used to reduce the computation of the two pseudo-distance matrices. The width of the extra bands is obtained by trying to satisfy the coverage constraint with the value of the current pseudo-distance in the other matrix.</Paragraph>
      <Paragraph position="1"> proc compute_matrices(A, B, C, pdAB,pdAc) compute pseudo-distances matrices with extra bands of pdAB/2 and pdAc/2 if \[dl&gt;_ pdist(d,B)+ pdist(A,C) main component</Paragraph>
      <Paragraph position="3"/>
    </Section>
    <Section position="2" start_page="730" end_page="731" type="sub_section">
      <SectionTitle>
3.2 Main Component
</SectionTitle>
      <Paragraph position="0"> Once enough in the matrices has been computed, the principle of the algorithm is to follow the paths along which longest common subsequences are found, simultaneously in both matrices, copying characters into the solution accordingly. At each time, the positions in both matrices must be on the same horizontal line, i.e. at a same position in A, in order to ensure a right order while building the solution, D.</Paragraph>
      <Paragraph position="1"> Determining the paths is done by comparing the current cell in the matrix with its three previous ones (horizontal, vertical or diagonal), according to the technique in (Wagner &amp; Fischer 74). As a consequence, paths are followed from the end of words down to their beginning. The nine possible combinations (three directions in two matrices) can be divided into two groups: either the directions are the same in both matrices, or they are different.</Paragraph>
      <Paragraph position="2"> The following sketches the algorithm, corn(A, B,C, D) has been initialised to: I AI - (pdist(d,B) + pdist(d,C)), iA, is and ic are the current positions in A, B and C. dirAB (resp. dirAc) is the direction of the path in matrix A x B (resp. A x C) from the current position. &amp;quot;copy&amp;quot; means to copy a character from a word at the beginning of D and to move to the previous character in that word.</Paragraph>
      <Paragraph position="3"> if constraint(iA, iB, ic, corn(A, B, C, D))  case: dirAB = dirAc = diagonal if A\[iA\] = B\[iB\] = C\[ic\] decrement corn(A, B, C, D) end if copy B\[iB\] + C\[ic\] - A\[iA\] ~ case: dirAB = dirAC = horizontal copy charb/min(pdist(A\[1..iA\], B\[1..iB\]), pdist( A\[1..iA\], C\[1..ic\]) ) case: dirAB = dirAc = vertical move only in A (change horizontal line) case: dirAB # dirAc if dirAB = horizontal copy B\[iB\]  aIn this case, we move in tile three words at the same time. Also, the character arithmetics factors, in view of generalisations, different operations: if the three current characters in A, B and C are equal, copy this character, otherwise copy that character from B or C that is different from the one in A. If all current characters are different, this is a failure.</Paragraph>
      <Paragraph position="4"> bThe word with less similitude with A is chosen, so as to make up for its delay.</Paragraph>
      <Paragraph position="5">  e\].se +-f dirAB = vertical move in A and C e1$C/ same thing by exchanging B and C end +-f end if</Paragraph>
    </Section>
    <Section position="3" start_page="731" end_page="731" type="sub_section">
      <SectionTitle>
3.3 Early Termination in Case of
Failure
</SectionTitle>
      <Paragraph position="0"> Complete computation of both matrices is not necessary to detect a failure. It is obvious when a letter in A does not appear in B or C. This may already be detected before any matrix computation. null Also, checking the coverage constraint allows the algorithm to stop as soon as non-satisfying moves have been performed.</Paragraph>
    </Section>
    <Section position="4" start_page="731" end_page="731" type="sub_section">
      <SectionTitle>
3.4 An Example
</SectionTitle>
      <Paragraph position="0"> We will show how the analogy like : unlike = known : x is solved by the algorithm.</Paragraph>
      <Paragraph position="1"> The algorithm first verifies that all letters of like are present either in unlike or known.</Paragraph>
      <Paragraph position="2"> Then, the minimum computation is done for the pseudo-distances matrices, i.e. only the minimal diagonal band is computed.</Paragraph>
      <Paragraph position="4"> As the coverage constraint is verified, the main component is called. It follows the paths noted by values in circles in the matrices.</Paragraph>
      <Paragraph position="5"> e k i 1 n u k n o w n  At each step, the coverage constraint being verified, finally, the solution x = unknown is ouptut.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML