XML Viewer - w04-2204

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-2204_evalu.xml
Size: 7,158 bytes
Last Modified: 2025-10-06 13:59:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2204">
  <Title>Automatic Construction of a Transfer Dictionary Considering Directionality</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Discussion
</SectionTitle>
    <Paragraph position="0"> We have shown the results of di erent matching metrics for di erent dictionary directions.</Paragraph>
    <Paragraph position="1"> Directionality is an important matter for building dictionaries automatically. In a K)E (or J)E) dictionary an index word contains non-conjugated forms whereas an index word in E)K (or E)J) dictionary contains POS and conjugated forms. Therefore we expect the combination of K)E and J)E to be better than K)E and E)J since we can avoid the mismatch of POS.</Paragraph>
    <Paragraph position="2"> On the other hand, a dictionary E)K or E)J contains less uniform information such as long expository terms, grammatical explanations and example sentences. Especially, POS is far more detailed than the dictionaries of the other direction. These all contribute to fewer good matching words.</Paragraph>
    <Paragraph position="3"> As for the better result using K)E and J)E, we cannot overlook language similarity: Korean and Japanese are very similar with respect to their vocabularies and grammars. This must have result in sharing relatively more appropriate English translations and further matching more appropriate Korean and Japanese translation equivalents.</Paragraph>
    <Paragraph position="4"> In the combination of K)E and E)J, the common English translations are reduced due to the characteristics of K)E and E)J. A K)E dictionary from the Korean speaker's point of view tends to have relatively simple English equivalents and normally POS is not shown. On the other hand, an E)J dictionary shows such complicated equivalents as explanation of the entry a, a piece of translation equivalent b and grammatical information as shown in (2) in Section 1. Therefore, it is natural that the matching rate is far less than the combination of K)E and J)E. Considering the size of dictionaries used in K)E and J)E (estimated maximum matches: 28,310 K)J pairs) and the one used in K)E and E)J (estimated maximum matches: 50,826 K)J pairs), we extrapolate from Table 5 that the method using K)E and J)E is better than the method using K)E and E)J.</Paragraph>
    <Paragraph position="5"> We concluded that: K)E + J)E outperforms K)E + E)J which outperforms E)K + E)J. The following brie y summarizes the three methods.</Paragraph>
    <Paragraph position="6"> K)E + J)E: { Equal characteristics of the dictionaries { The meaning of the registered words tends to be translated to a typical, core meaning in English { Synergy e ect: Korean and Japanese are very similar, leading to more matching.</Paragraph>
    <Paragraph position="7"> K)E + E)J: { The combination of di erent characteristics of dictionaries makes automatic matching less successful.</Paragraph>
    <Paragraph position="8"> { A core meaning is extended to a peripheral meaning at the stage of looking up E)J. (See Figure 2.) E)K + E)J: { There are far fewer matches.</Paragraph>
    <Paragraph position="9"> { We can take advantage of example sentences, expository terms, and explanations to extract functional words.</Paragraph>
    <Paragraph position="10"> { We can improve accuracy by including English POS data.</Paragraph>
    <Paragraph position="11"> Even though we expected that the combination of dictionaries between E)K and E)J will not provide a good result, it is worthwhile to know limits. After analyzing all of the result, we found that there is the e ect of dictionary directionality. Also, we con rm that if we can use all the methods and combine them, we will get the best result since the output of the three dictionary combinations do not completely overlap. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Future Work
</SectionTitle>
      <Paragraph position="0"> Our goal is not restricted to making a Korean-Japanese dictionary, but can be extended to any language pair. We assume that we do not know the source and target languages so well that it is not easy to match just the content words. Instead, we need to match automatically any kind of entries, even such functional words as particles, su xes and pre xes. We think that it is best to extract these functional words by taking advantage of the characteristics of the E)K and E)J dictionaries. For example, one of the merits of using E)K and E)J is that we can get conjugated forms such as the Korean adjective a0 a1a3a2a4a5a7a6 a1a9a8a10 which matches the English adjective beautiful; it is normally not registered in a K)E dictionary because a0 a1a3a2a4a5a7a6 a1a9a8a10 is an adjective conjugated form of the root a0 a1a3a2a4a5a12a11a14a13a15 a6 a1 . Only the root forms are registered in an X-to-English dictionary. Also for verbs, we can get non nite forms using E)K and E)J dictionaries. As index word, the non-conjugated forms are registered in a J)E dictionary such as a3a3a3 a4a4a4 a6a6a6a17a16a16a16 meaning beautiful or clean. However, by using E)J, we can get conjugated forms such as a3a3a3a5a4a4a4a7a6a6a6a11a10a10a10 , a3a3a3a20a4a4a4 a6a6a6 a8a8a8 and so forth. Registering all conjugated forms in a dictionary simpli es the development of a machine translation system and further second language acquisition.</Paragraph>
      <Paragraph position="1"> The direction from English-to-X contains a lot of example sentences. So far, the idea of using example sentences and idiomatic phrases for dictionary construction has not been adopted.</Paragraph>
      <Paragraph position="2"> To check the possibility of extracting functional words, we extracted example sentences and idiomatic phrases from E)J and E)K dictionaries based upon the number of shared English words and look into the feasibility of using them to extract functional words.</Paragraph>
      <Paragraph position="3"> We extracted a total of 1,033 paraphrasing sentence pairs between Korean and Japanese with ve or more shared English words. Among them, 465 sentences (45%) matched all the English exactly (=), and 373 sentences (36.1%) almost ( ) matched. We give examples below: = (10) &amp;quot;as for me, give me liberty or give me death.&amp;quot; a0a2a29 a70a2a1a4a3 a47a6a5a8a7a10a9 a69 a4a71a8a12a11a33a4 a9a10a13a71a28  &amp;quot;Tom is taller than any other boy in his class.&amp;quot; a57a30a4 a56a3a32a59a58a61a60a37 a21a24a23 a45a63a62a6a64a55 a53a55a19a65a66 a5 a37a68a67a66a70a69 a21 a39 a37 a3a5 a5 a37 . (extracted from E)K and E)J) The numbers in parentheses in the above examples represent how many English words are shared between E)K and E)J. Using these paraphrasing sentences we will examine the effective way of extracting functional words.</Paragraph>
      <Paragraph position="4"> Finally we would like to apply our method to open source dictionaries, in particular EDICT (J)E, Breen (1995)) and engdic (E)K, Paik and Bond (2003)). This would make the results available to everyone, so that they can be used in comparative evaluation or further research.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML