File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2098_intro.xml
Size: 4,591 bytes
Last Modified: 2025-10-06 14:05:58
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2098"> <Title>Extraction of Lexical Translations from Non-Aligned Corpora</Title> <Section position="4" start_page="0" end_page="580" type="intro"> <SectionTitle> 2 Assumption and Ambiguity Resolution </SectionTitle> <Paragraph position="0"> The source language is denoted as LA and the target as LB. Japanese and English have been adopted as LA and LB, respectively. Matrix A is defined with its (i, j)-th element as the value representing co-occurrence between two words ai and aj in LA, with a similar definition for B. A and B are symmetric matrices. The number of words in LA and LB are denoted as NA and NB. The (i,j)-th element of matrix X is denoted as Xij.</Paragraph> <Paragraph position="1"> The cited Japanese examples are listed in the Appendix with their transliterations and first meanings. The cited English examples are written in this font.</Paragraph> <Section position="1" start_page="0" end_page="580" type="sub_section"> <SectionTitle> 2.1 Formalization </SectionTitle> <Paragraph position="0"> Translations of two co-occurring words in a source language also co-occur in the target language is assumed. For example, doctor and</Paragraph> <Paragraph position="2"> nurse co-occur in English and their translations \[~ and ~ also co-occur in Japanese.</Paragraph> <Paragraph position="3"> Rapp (1995) verified this assumption between English and German. He showed that two matrices A and B resemble each ottmr, when ai correspond to bi for all i. Thus, the resem'ch had the additional assumption that, English words and German words correspond one~to-one.</Paragraph> <Paragraph position="4"> We introdnce the translation matrix T from A to B because a word corresponds to several words rather than one. The (i,j)-th element of T is defined a~s the conditional probability p(bj\[ai), the translational probability of bj given hi. T forms a stochastic matrix, such that the sum of all elements in the same row is 1.0.</Paragraph> <Paragraph position="5"> The co-occurrences A~ in LA can be translated into LB using both p(bklau) mid p(btlav): ~-~p(bkla=)A=~p(btla,) (11 Denoting for all Bkl, (1) can be rewritten in a simple matrix formulation as follows:</Paragraph> <Paragraph position="7"> Note that tim resulting matrix is also symmetric.</Paragraph> <Paragraph position="8"> Returning to the example of doctor given in this section, its translation is ~ but not |~:t:, because ~, the translation of the co-occurring word nurse, co-occurs with ~ but not with 15::1:.</Paragraph> <Paragraph position="9"> Thus, our assumption serves to resolve ambiguity.</Paragraph> <Paragraph position="10"> This fact indicates that the translated co~ occurring matrix T t AT should resemble/3 (Figure 1). Defining IX- Y\] as a certain distance between matrices X and Y, ambiguity resolution is possi~ ble by simply obtaining T which minimizes the following formula:</Paragraph> <Paragraph position="12"> when A and B are known. Note that the above formulation assumes that the co-occurrence in LA can be transformed congruently into L~. Thus, T gives the pattern matching of two structures formed by co-occurrence relations (Section 4.2).</Paragraph> </Section> <Section position="2" start_page="580" end_page="580" type="sub_section"> <SectionTitle> 2.2 The Choice of Co-occurrence </SectionTitle> <Paragraph position="0"> ~qeasure and Matrix Distance There :~:c many alternatives to measure co-occurrence between two words x and y (Church, 1990; Dunning, 1993). Having fi'eq(x) as the count of x in the entire text, freq(x, y) as the number of appearances of both x and y within a window of a fixed number of words, and N as the number of words in the text concerned, we adopt the following mutual information:</Paragraph> <Paragraph position="2"> Rapp argues that, freq(ai, aj)2/freq(ai)freq(aj) is although more sensitive than above. Formula (4), however, will be adopted due to its statistical property being already studied (Church, 1990).</Paragraph> <Paragraph position="3"> Rapp normalized matrices A and B. We, however, do not normalize from the reason that the value by Formula (4) is already normalized by N 1 .</Paragraph> <Paragraph position="4"> Distance for matrices should also be considered.</Paragraph> <Paragraph position="5"> Rapp used the sum of absolute distance of the elements. Since our requirement is that the distance is easy to handle analytically to obtain T as in Section 4.1, the following definition was ctmsen:</Paragraph> <Paragraph position="7"/> </Section> </Section> class="xml-element"></Paper>