File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/w97-0114_abstr.xml

Size: 7,144 bytes

Last Modified: 2025-10-06 13:48:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0114">
  <Title>I I I I I</Title>
  <Section position="2" start_page="0" end_page="127" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper proposes a method to identify zero pronouns within a ~\]apansse sentence and their antecedent equivalents within the corresponding English sentence from aligned sentence pairs. The method focuses on the characteristics of Japanese and English, in two languages from cHfBerent f~rngles and in which distribution of zero pronouns is very d.uTerent. In this method, the Japanese sentence and English translation within the Japanese and English aligned sentence pairs are analyzed. Then, the pairs of Japanese word/phrase and their English equivalent word/phrase are identified from each aligned sentence pair. Next, zero pronouns within a Japanese sentence are identified by using the syntactic and semantic structure of the Japanese sentence and their antecedents within the English sentence are identified by using the characteristics of anaphoric and deictic expressions in English. This method was implemented using the Japanese-to-English machine translation system, ALT-3/E for the analysis of Japanese sentences and Brill's tagger for the analysis of the English sentences. According to my evaluation, for 554 zero p~onouns in a sentence set for the evaluation of 3apanese-to-Engllsh machine translation systems, 91.5% of the pairs of zero pronouns in the Japanese sentences and their antecedents in the English translations were automatically identified correctly.</Paragraph>
    <Paragraph position="2"/>
    <Section position="1" start_page="0" end_page="127" type="sub_section">
      <SectionTitle>
1.1 Motivation
</SectionTitle>
      <Paragraph position="0"> In natural languages, elements that can be easily deduced by the reader are frequently omitted from expressions in texts (Kuno, 1978). This phenomenon causes considerable problems in natural language processing systems. For example, in a machine translation system, the system needs to recognize those elements which are not present in the source language, but may become mandatory elements in the target language. In particular, the subject and object are often omitted in Japanese; whereas they are normally obligatory in English. Thus, in Japanese-to-Engiish ma~ne translation systems, it is necessary to identify case elements omitted from the original Japanese (&amp;quot;zero pronouns&amp;quot;) for their translation into Engiish expressions.</Paragraph>
      <Paragraph position="1"> Several algorithms have been proposed with regard to this problem (Kameyama, 1986; Walker et al., 1990; Y=oshimoto, 1988; Dousa~a, 1994). V/hen considering the application of these methods to a practical machine translation system for which the translation target area can not be 1/mired, it is not possible to apply them directly, both because their precision of resohtion is low as they only use limited information, and because the volume of knowledge that must be prepared beforehand is so large.</Paragraph>
      <Paragraph position="2">  To overcome these kinds of problems, several methods to resolve zero pronouns which consider appl/cations for a practical machine translation system with an n~\]imJted translation target area, have been proposed (Nakaiwa and Ikehara, 1992; Nakaiwa and Ikehara, 1995; Nakalwa and Ikehara, 1996). These methods use categorized semantic and pragznatic constraints such as verbal semantic attributes (Nakaiwa et al., 1994) and types of modal expressions and conjunctions as a condition for anaphora resolution of zero pronouns.</Paragraph>
      <Paragraph position="3"> But, with these methods it is necessary to make resolution rules for zero pronouns by hand.</Paragraph>
      <Paragraph position="4"> To make robust rules, with wide coverage, takes a lot of time and labor. Analysts who make these resolution rules must be f~rnl\]iar with the NLP system itself. Furthermore, the types of zero pronouns change depending on the types of documents which must be analyzed. So, resolution rules must be made depencling on the target domain of the documents. But, it is very difficult to make rules for every domain because of the time cons-rni~g labor and the need for expertise. Because of these problems, a method to make resolution rules of zero pronouns effectively and efficiently is aeeded.</Paragraph>
      <Paragraph position="5"> In order to acquire resolution rules for a NLP system effectively and efficiently, various methods have been proposed. One typical method for this purpose is to use a corpus for extracting resolution rules by analyzing each sentence hi the corpus. With regard to the automatic extraction of resolution rules for zero pronouns, several methods have been proposed (Murata and Nagao, 1997; Nasukawa, 1996). But t\]zese methods use monolingual corpora and they find it difficult to extract resolution rules of zero pronouns whose referents are normally unexpressed in Japanese. Furthermore, rules can only be made when slrnilar expressions to those containing the zero pronouns are found in the corpus.</Paragraph>
      <Paragraph position="6"> It seems that a bilingual corpus consisting of sentence pairs, with an original in one language and a translation, is better than a monolingual corpus for the purpose of acquiring resolution rules of zero pronouns. This is particularly so with a bilingual corpus of Japanese and Engl/sh whose language farn~lles are so different and in which the distribution of zero pronouns is also very different. This combination is more useful than the bilingual corpora of similar languages.</Paragraph>
      <Paragraph position="7"> The technique for acquiring various kinds of rules such as translation rules, grammar rules, dictionary entries and so on from bilingual corpora needs to include several kinds of sub-techniques; identification of aligned sentence pairs which consist of pairs of one language sentence and translation equivalents of the sentence (sentence alignment); identification of equivalent words/phrases pairs from aligned sentence pairs (word alignment); and extraction of rules such as translation rules, grammar rules, dictionary entries and so on from identified aligned sentence pairs and equivalent word/phrase pairs.</Paragraph>
      <Paragraph position="8"> Several methods have been proposed with regard to aligning sentences (Brown et al., 1991; Gale and Church, 1991; Haruno and Yamazaki, 1996; Kay and l~osche/sen, 1993), alJs~nlng words (Church, 1993; Kupiec, 1993; Matsumoto et al., 1993; Wu, 1995; Yamada et al., 1996) and acquiring rules from bilingual corpora (Dagan et al., 1991; Dagan and Church, 1994; Fung and Church, 1994; Tana~, 1994; Yamada et al., 1995). From the point of view of the extraction of resolution rules of zero pronouns, a technique to identify zero pronouns in a sentence in one language and their antecedents in a translation from aligned sentence pairs is needed. But there is currently no method to identify zero pronouns and their antecedents automatically within bilingual corpora.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML