File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2098_metho.xml
Size: 9,420 bytes
Last Modified: 2025-10-06 14:14:14
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2098"> <Title>Extraction of Lexical Translations from Non-Aligned Corpora</Title> <Section position="5" start_page="580" end_page="581" type="metho"> <SectionTitle> 3 Local Ambiguity Resolution </SectionTitle> <Paragraph position="0"> Note that, the elements with value 0.0 in a matrix are denoted by &quot;-&quot; in the following discussion.</Paragraph> <Section position="1" start_page="580" end_page="581" type="sub_section"> <SectionTitle> 3.1 Example of doctor </SectionTitle> <Paragraph position="0"> Suppose that doctor occurs in the local context &quot;The doctor nursed the patient.&quot; We wmlt to disambiguate the meaning of doctor as the medical doctor, not Ph.D. As doctor co-occurs with nurse and patient, nurse with doctor and patient etc., tim matrix A can be defined by Formula (4) as follows2: doctor nurse patient doctor - 3.0 3.0 nurse 3.0 - 3.0 patient 3.0 3.0 For T, only the ambiguity of doctor is concerned here for simplicity, not that of nurse or patient, giving T as follows:</Paragraph> <Paragraph position="2"> Note that ~ is a co-occurring word with |~t.</Paragraph> <Paragraph position="3"> Here we are interested in whether Tll = 1.0 (doctor- \[~) or ~/~1 = 1.0 (doctor--- |~d:): the correct answer is clearly T11 = 1.0.</Paragraph> <Paragraph position="5"> of each occurrence. Here NA is 3, the three words doctor, nurse and patient.</Paragraph> <Paragraph position="6"> Tile quality of A is poor from a statistical point of view (Church, 1990). What is needed in the local ambiguity resolution is only the information of co-occurring words, and the co-occurrence values are not that important when forming A. Although there are other solutions for forming A, for example, to put all elements concerned simply to 1.0, this definition was used because the local and global problems can be handled within exactly the same framework.</Paragraph> <Paragraph position="7"> B is obtained globally from the corpus in LB.</Paragraph> <Paragraph position="8"> Suppose that B for the words in question is given for simplicity as follows:</Paragraph> <Paragraph position="10"> We experimentally put Tl1 = 1.0, so that doctor corresponds to I!K~, and calculated TtAT giving the following result with F(T) = 5038:</Paragraph> <Paragraph position="12"> Next, we put T41 = 1.0, so that doctor corresponded to ~$:t:. TtAT gave the following result</Paragraph> <Paragraph position="14"> These two results indicate that T with ~/\]l = 1.0 (doctor- N~ff) makes TtAT and B closer than T with T41 = 1.0 (doctor- ~i~=t:). Therefore the translation of doctor is determined to be \[~.</Paragraph> <Paragraph position="15"> The algorithm to choose the translation from several candidates reflecting the local context is summarized as follows: 1. Create a local A.</Paragraph> <Paragraph position="16"> 2. Make a T that assumes one candidate to be the translation. Calculate the distance F(T) for each candidate.</Paragraph> <Paragraph position="17"> 3. Choose the T with the minimum F(T).</Paragraph> </Section> <Section position="2" start_page="581" end_page="581" type="sub_section"> <SectionTitle> 3.2 Related Work </SectionTitle> <Paragraph position="0"> Dagan (1994) proposed a method to choose a translation according to the local context. The significance of this work is that the ambiguity is not solved within LA, as was trmtitionally studled, but was solved in LB, same as our standpoint. Word to be translated (a~) and its relating word (av) concerning phrasal structure (for example objective for verb) were translated into Lu (bi and by, respectively), using an electronic dictionary.</Paragraph> <Paragraph position="1"> The co-occurring frequency within LB was measured and p(bk, bl lau, a.) was estimated as follows: \]req(bk, bt) (6) Dagan chose bk of the largest p(bk,blla~,,av) as translation after statistically testing its reliability. The difference with our method is that he estimated the translational probability between pairs (the word and its co-occurrence) whereas our framework reduces the translational probability of pairs into that of words. Thus, our method can be applied to obtain global translations, which will be explained in the following section.</Paragraph> </Section> </Section> <Section position="6" start_page="581" end_page="582" type="metho"> <SectionTitle> 4 Global Extraction of Translations </SectionTitle> <Paragraph position="0"> The extraction of global lexical translations is formulated using the same framework as ambiguity resolution in the local context. The difference is that A is formed globally from the corpus in LA.</Paragraph> <Paragraph position="1"> For local context, the number of possible translations is small enough that each case can he tested one after another to find the best T. Unfortunately, the same method cannot be applied to obtain global translations because the number of combinations of possible translations explodes.</Paragraph> <Paragraph position="2"> Hence, we propose a method to update T incr~ mentally.</Paragraph> <Section position="1" start_page="581" end_page="581" type="sub_section"> <SectionTitle> 4.1 Steepest Descent Method </SectionTitle> <Paragraph position="0"> T is not a square matrix and the number of equations obtained by TtAT = B is not always equal to that of variables Tij, so the equation may not be solved directly. We therefore try to obtain the best T by the Steepest Descent Method (SDM) to minimize the Formula (3). T is incrementally updated from T~ to T,~+l by:</Paragraph> <Paragraph position="2"> where dT can be calculated with ds being a certain small length as:</Paragraph> <Paragraph position="4"> The constraint for T that the sum of the same row must be 1.0 can be reflected on the calculation using Lagrange's method of indeterminate coefficients.</Paragraph> </Section> <Section position="2" start_page="581" end_page="582" type="sub_section"> <SectionTitle> 4.2 Characteristics of Our Method </SectionTitle> <Paragraph position="0"> If words are regarded as nodes, relations such as co-occurrences and translations as branches, then matrices A, B and T represent graphs.</Paragraph> <Paragraph position="1"> Suppose that A and B are exactly the same graph as in Figure 2. The representation matrices are also indicated in the figure.</Paragraph> <Paragraph position="2"> The best T is obviously as follows,</Paragraph> <Paragraph position="4"> This means that al, as, a3, a4 correspond to b4, b3, b2, bl respectively. It also indicates that al</Paragraph> <Paragraph position="6"> does not eorrest)ond to b3, b2, or b~, whi('h is exactly the disambiguation. In terms of linear algel)ra, the calculation TtAT is so-called a &quot;congruent transformation.&quot; T provi(tes the l)attern matching of the two graphs given by A and B.</Paragraph> <Paragraph position="7"> Next, sut)pose that A is defined ,~ al)ove and II is written in a block matrix as shown in Figure 3, containing the same grat)hs as A. ~/' will clearly be T = 1/2(E E) with E being a unit matrix of size 4. The I)oint is that our algorithm has a limit for aunbiguity resolution especially when there are several resembling graphs interc(mnected, that is, the ambiguity of aj cannot be resolved between b:l and b~.</Paragraph> <Paragraph position="8"> On the other hand, as shown in (Brown, 1993), methods using aligned corlms does not have this limit. Starting his nmthod with every English word eorrest)onding to all French words, only several French words remain as translations in the result. This difference shows our weak point comt)ared with Brown's.</Paragraph> <Paragraph position="9"> Our inethod, assunfing that two graphs can be linearly transformed, only tries to make a match between two grat)hs in LA and LB without aligned corpus, so some hints for obtaining the correct correspondences, some compensations for the. lack of aligned corpus, are nee(ted. For example, when the wtlue of (i,j)-th element is zero in T0, the value of the saine element can be ket)t at zero during the SDM.</Paragraph> </Section> <Section position="3" start_page="582" end_page="582" type="sub_section"> <SectionTitle> 4.3 Related Work </SectionTitle> <Paragraph position="0"> Some research using aligne(t corpus point (),it problems with corpus size and noise, which leads to insufficient a('curacy in translations.</Paragraph> <Paragraph position="1"> Fling (11995) asserts l;hat translation of words or I)hrases might not exist even in the aligned corpus. She extracte(l noun translations from noisy aligned corpus. First, a number of obvi-</Paragraph> <Paragraph position="3"> ous translations were statistically extracted, then the mlce.rtaill translations were found using the co-occurrence with the obvious ones.</Paragraph> <Paragraph position="4"> Utsuro (1994) claimed that there is a nee(t to extract lexical translations even from an aligned corpus of a small size an(t proposed to use an (dectronic (tictionary as an aid. First, a certain nlllllbcr of candidates are found. If a candidate in LB co-occurs with miother found ill the electronic di('tionary, its probability of being the translation is adjusted to be higher.</Paragraph> <Paragraph position="5"> The cominon idea in the two approaches, the use of lexical co-occurrence within Lu, was also introduced by Dagan (1994).</Paragraph> </Section> </Section> class="xml-element"></Paper>