File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/a92-1028_intro.xml
Size: 12,193 bytes
Last Modified: 2025-10-06 14:05:05
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1028"> <Title>Zero Pronoun Resolution in a Machine Translation System by using Japanese to English Verbal Semantic Attributes.</Title> <Section position="3" start_page="201" end_page="203" type="intro"> <SectionTitle> 2 Zero Pronouns as viewed from Machine Translation </SectionTitle> <Paragraph position="0"> Zero pronouns are very common in Japanese discourse, but the number of zero pronouns that actually require resolution varies according to the purpose for which analysis results are to be used. For example, the case of a question and answer system involving a task such as replying to questions from a user who has just read a sentence. The questions, which can come from several points of view, must be anticipated, and practically all of the zero pronouns in the sentence will require resolution. In contrast, in the case of machine translation of text, depending on the translation languages, zero pronouns requiring resolution tend to limited. This paper considers the task of extracting zero pronouns in a Japanese to English text machine translation system. We first examine the four basic factors important in implementing such a system.</Paragraph> <Section position="1" start_page="201" end_page="201" type="sub_section"> <SectionTitle> 2.1 The difference in conception between </SectionTitle> <Paragraph position="0"> Japanese and English expressions When extracting zero pronouns in machine translation, whether the zero pronouns require resolution analysis or not needs to be decided. For example, in the sentence.</Paragraph> <Paragraph position="1"> (1)X-sha ha 2-gatsu-l-nichi, ha-dodhisuku-shouchi wo CompanyX TOP February 1 hard disc device OBJ hatsubai-suru.</Paragraph> <Paragraph position="2"> place on sale &quot;Company X will put on sale the hard disc device from February 1.&quot; asubj aobj tsuki-4OO-dai seisan-suru.</Paragraph> <Paragraph position="3"> 400 units per month produce &quot;They produce 400 units of it per month.&quot; The second sentence has a structure that is centered around the verb &quot;seisan-suru(produce)&quot; and the subject and object have become zero pronouns. But to translate the sentence into natural English, there is a need to rewrite it into a predicate noun sentence (&quot;da&quot; sentence, so called because of the original Japanese &quot;Gessan wa 400 dai da&quot;) to lead (2) Gessan ha 400-dai da.</Paragraph> <Paragraph position="4"> Monthly production TOP/SUBJ 400 units is &quot;Monthly production is 400 units&quot;.</Paragraph> <Paragraph position="5"> To translate the expression in this form, referential analysis of the zero pronouns of the subject and object of the verb &quot;produce&quot; is no longer necessary. When translating this type of expression, the syntactic/semantic structure of the sentence to be translated is first converted into an English type structure in the source language (This is makes the Japanese-Japanese conversion) in an analysis phase. Selection of only zero pronouns whose referent needs to resolved becomes possible.</Paragraph> </Section> <Section position="2" start_page="201" end_page="201" type="sub_section"> <SectionTitle> 2.2 The difference in case frame patterns </SectionTitle> <Paragraph position="0"> between Japanese and English There are verbs, the case elements of which are mandatory in Japanese but optional when translated into English. For example, an expression such as,</Paragraph> <Paragraph position="2"> in which there is no subject in Japanese, it would be possible to translate this by using the expression,&quot; X raise Y&quot;. In cases such as this, it would be useful to prepare case patterns to be used for syntactic analysis for each and every translation of English verb form and designate the English case structure when analyzing the Japanese. Elements which do not become mandatory cases in English will then not be mandatory cases in Japanese either. Thus deciding which zero pronouns must be analyzed can be done accurately.</Paragraph> </Section> <Section position="3" start_page="201" end_page="201" type="sub_section"> <SectionTitle> 2.3 Restrictions by Voice </SectionTitle> <Paragraph position="0"> Elements which have become zero pronouns in Japanese will, if the voice can be changed to give natural English, not need to be resolved. For example, * A sentence originally in the passive voice In this case, converting the English expression to passive voice will limit the zero pronouns for which the referent must be identified.</Paragraph> <Paragraph position="1"> * Sentences containing verbs which take the passive voice in Japanese become active in English. For example, the expression,</Paragraph> <Paragraph position="3"> A OBJ B in publish-PASSIVE &quot; A is published in B.&quot; is the passive expression of &quot;osubj publishes A in B&quot; in which the subject has become a zero pronoun. In English, however, even though there is no subject in Japanese, it is possible to translate this to the expression &quot;A appears in B&quot;. In cases such as this, case frame patterns must be prepared by modifying the English language to be used in syntactic analysis. When analyzing the Japanese, it is possible to limit the number of zero pronouns which must be resolved by limiting mandatory case patterns to those instances that are accompanied by passive aspects which are mandatory cases in the English case pattern.</Paragraph> </Section> <Section position="4" start_page="201" end_page="202" type="sub_section"> <SectionTitle> 2.4 Restriction by translation structure </SectionTitle> <Paragraph position="0"> osub sofuto wo OS ni Kumikomu-kow de software OBJ OS into incorporate-EMBEDDED by setsuzoku-daisuu wo fuyasi-ta number of units to be connected OBJ increase-PAST &quot;They increased the number of units to be connected by incorporating the software into the OS.&quot; the verbs &quot;incorlxnate&quot; and &quot;increase&quot; have tamed the subject into a zero pronoun. The sentence with &quot;Kumikomukoto(incorporate-EMBEDDED)&quot; is structured as an &quot;embedded sentence&quot; modifying the action &quot;koto&quot;. Translated into English, the portion &quot;koto de&quot; becomes the methodical case &quot;by incorporating software into the OS&quot; and assumes a gerund phrase expression. That is the embedded sentence in Japanese becomes a prepositional phrase accompanied by a gerund phrase. Because different sentence structures are generated between Japanese and English, zero pronouns need to be extracted by converting the Japanese original to an English like syntactic/semantic structure.</Paragraph> <Paragraph position="1"> In a Japanese to English machine translation system, it is important to classify zero pronouns with due consideration of the factors outlined above.</Paragraph> <Paragraph position="2"> 3. Appearance of Zero Pronouns in</Paragraph> </Section> <Section position="5" start_page="202" end_page="203" type="sub_section"> <SectionTitle> Newspaper Articles </SectionTitle> <Paragraph position="0"> With due consideration of the conditions as presented in Chapter 2, we examine where troublesome zero pronouns and their referents appear in newspaper articles. Newspaper articles generally tend to use compressed forms of expressions. Thus, declinable words are frequently turned into nouns by compressing the declinable suffixes. Thus, more often than not, it is impossible to determine the zero pronoun's referent merely by relying on postpositional particle information, themes or the types of empathy-loaded verbs. For example, (6) NTT ha 1WIq&quot; TOP &quot;NTr will shingata-koukanki wo dounyuu-sita.</Paragraph> <Paragraph position="1"> new model switchboard OBJ introduce introduce a new model switchboard.&quot; esubj fiko-shindan-kinou wo wusai, self checking function OBJ equip &quot;The new model switchboard is equip with self checking function and&quot; esubj 200-shisutemu wo secchi-suru yotei-da. 200 systems OBJ install be-planning-to &quot;NTI&quot; is planning to install 200 systems.&quot; In the first sentence, the subject is topicalized, but in the second sentence, the subject of the first portion of the sentence and the subject of the latter portion of the sentence are zero pronouns. Of the two zero pronouns, in the former case, the &quot;shingata-koukanki&quot;(new model switchboard), which is the object of the former sentence, and in the latter case, &quot;NTT&quot;, which is the subject of the former sentence become the referents. Thus, when there are elements which have been topicalized, and there are no other elements that can be topicalized, it cannot be taken for granted that topicalized elements will become the resolution elements for zero pronouns. Under such circumstances, there is a need for information other than whether the element has been topicalized or not, such as further semantic restrictions. The lead paragraphs in 29 newspaper articles, totaling 102 sentences in all, were examined for zero pronouns and their referents, and the results are shown in Table 1. There were 88 cases of zero pronouns. According to this study, the case where elements topicalized by the postpositional particle &quot;ha&quot; in the first sentence became the referents of zero pronouns when being made the subject in the second sentence, was most common, with 45 instances (51%).</Paragraph> <Paragraph position="2"> Furthermore, zero pronouns having referents in the first sentence, totalled 76 instances (86%). With newspaper articles, the fast sentence contains information that gives an outline of the entire article and thus the case element tends to become the referent. There were 67 instances (74%) of zero pronoun referents in the second and following sentences being used by the first sentence amounted to 67 instances(74%) which strongly suggests the importance of the first sentence.</Paragraph> <Paragraph position="4"> 2nd sentence and Non thereafter, in the Sub Not h theSameSmtence Sent- Total Ha Ga Wo Etc. ence \[Cases\]</Paragraph> <Paragraph position="6"> Table 1 Frequency of Appearance of Zero Pronouns and Their Referents (Source of Sample Sentences: Nikkei Sangyo Newspaper, Information column,lead paragraphs during February, 1988.29 articles (102 sentences) 2-8 sentences per article.</Paragraph> <Paragraph position="7"> Of the newspaper articles tested, the number of sentences with zero pronoun(s) contained was 56 out of 102.) * &quot;Ha&quot;(pronounced &quot;Wa&quot;),&quot;Ga&quot;,&quot;Wo&quot;, which are postpositional particles in Japanese,respectively indicating the theme, subject, direct object.</Paragraph> <Paragraph position="8"> Moreover, there were 12 instances (14%) where the referent was neither the theme nor the subject; the zero pronoun is the subject. From this, it can be observed that it would be inappropriate to rely solely on the technique of selecting the referent from case elements that have been topicalized or of determining the order of priorities for resolution elements from the type of postpositional particle.</Paragraph> <Paragraph position="9"> These 12 instances were studied further and found to contain verbs that included the referent. Such verbs were &quot;hatsubaisuru&quot; (sell), &quot;kaisetsusuru&quot; (establish), &quot;kaihatsusuru&quot; (develop) and other such words intended to introduce new object elements. Verbs for zero pronouns tend to be a noun predicate as in &quot;LAN da&quot; (That is LAN) -- \[In English, it would correspond to the expression, &quot;o be <noun>&quot;\] or, to words such as &quot;belong to&quot; indicating attributes. To resolve this type of zero pronoun, it would appear essential that verb attributes be categorized and the zero pronoun referent be determined from the relationships of verbal semantic attributes.</Paragraph> </Section> </Section> class="xml-element"></Paper>