File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-1302_metho.xml
Size: 17,603 bytes
Last Modified: 2025-10-06 14:14:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1302"> <Title>Constraints and Defaults on Zero Pronouns in Japanese Instruction Manuals</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Zero pronouns in manual sentences </SectionTitle> <Paragraph position="0"> Let's consider the following Japanese sentence, which shows a certain instruction.</Paragraph> <Paragraph position="1"> Here, 'TO' is a Japanese conjunctive particle which represents a causal relation, and 'ARE' shows ability or permission. The symbol C/ denotes a zero pronoun. null On the other hand, the following sentence, which does not have the suffix 'ARE', has a different interpretation. null The zero pronoun C/d refers to not the hearer(the user) but the machine, even though C/C/ refers to the from the translation in (1). It is due to the difference of the viewpoint between Japanese and English. The difference has no effect on the selection of zero pronoun's referent.</Paragraph> <Paragraph position="2"> user as well as (1). Note that when only the matrix clause of (3) is used as shown in (4), C/~ can be interpreted as either the hearer or the machine 2. (4) C/e de -mas-u.</Paragraph> <Paragraph position="3"> Ce-NOM go out -POL -NONPAST.</Paragraph> <Paragraph position="4"> C/~ will go out.</Paragraph> <Paragraph position="5"> These examples show that the expressions TO and ARE impose some constraints on the referents of StJB-JECIS of the sentences. As described so far, there are many cases that linguistic expressions give us key information to resolve some type of ambiguity like the anaphora of zero pronouns. In the rest of this paper, we will show several pragmatic constraints, which can account for the interpretations of zero subjects including the cases described above. Dohsaka(Dohsaka, 1994) proposes a similar approach, in which several pragmatic constraints are used to determine referents of zero pronouns. For example, honorific expressions and the speaker's point of view are used in his approach. While his approach treats dialogue, our targets are manual sentences. Nakaiwa et.al.(Nakaiwa and Shirai, 1996) also propose the method which is based on semantic and pragmatic constraints. Although they report that their method estimates over 90% of zero subjects correctly, there are several difficulties including the fact that the test corpus is identical with the corpus from which the pragmatic constraints are extracted, and the fact that there are so many rules(46 rules to estimate 175 sentences).</Paragraph> <Paragraph position="6"> As for the identifying method available in general discourses, the centering theory(Brennan et al., 1987; Walker et al., 1990) and the property sharing theory(Kameyama, 1988) are proposed. The important feature of these theories is the fact that it is independent of' the type of discourse. However, according to our experimental result, it seems that these kinds of theory do not estimate zero subjects in high precision for manual sentences 3. The linguistic constraints specific to expressions are more accurate than theirs if the constraints are applicable.</Paragraph> <Paragraph position="7"> 70% in our experiment. One reason why the precision is not so good is that the structure of texts in (Japanese) manuals is slightly different from the ordinary discourses structure.</Paragraph> <Paragraph position="8"> * Since the essential function of manuals is to provide users with information to make the machine operate properly, the existence of users should be considered at all times.</Paragraph> <Paragraph position="9"> * Manuals should appropriately provide information which is required by users.</Paragraph> <Paragraph position="10"> On the other hand, the following tendency are pointed out in many linguistic literatures.</Paragraph> <Paragraph position="11"> * The readers have the same point of view as the writer.</Paragraph> <Paragraph position="12"> * Generally, the first candidate of the point of view is the nominative.</Paragraph> <Paragraph position="13"> According to these considerations, we make the following hypothesis: Hypothesis 1 (Manuals easy to understand) * All descriptions are written from the viewpoint of users. Therefore, in general, subjects in manual sentences tend to be users.</Paragraph> <Paragraph position="14"> * Things users know, tend to be omitted for readability unless they are needed. Therefore, the subject of the sentence whose agent is a user tends to be omitted.</Paragraph> <Paragraph position="15"> * On the other hand, things which readers do not known, like reactions of operations, prompt from machines and so on, tend to be specified explicitly.</Paragraph> <Paragraph position="16"> As the parts of ontology, we should consider, at least, two types of information: the properties of the objects in manuals and the discourse situation that is characterized by linguistic roles like a writer and a reader.</Paragraph> <Paragraph position="17"> Constraint 1 (Objects) User has intention.</Paragraph> <Paragraph position="18"> Manufacturer has intention.</Paragraph> <Paragraph position="19"> Machine has no intention.</Paragraph> <Paragraph position="21"> From these constraints of the ontology, we can obtain the constraint of persons as follows.</Paragraph> <Paragraph position="22"> In the rest of this paper, we will propose several constraints and defaults based on the property of linguistic expression under the hypothesis and the constraints described above. Then, we will examine them with test examples from several manual sentences. Note that the constraints and defaults we propose here are derived not from some specific manuals but from our linguistic consideration for each of linguistic expressions. Therefore, we do not adopt strict validation method like 'cross validations', which is used in machine learning, to examine them. However, in order to confirm the validity of our constraints and defaults, we have checked them out with 24 manuals from various areas. Although we cannot explain all of our defaults and constraints here because of shortage of space, we will briefly show the table of our all defaults and constraints in Section 6.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Constraints and defaults based on </SectionTitle> <Paragraph position="0"> the type of verbs</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Request form </SectionTitle> <Paragraph position="0"> The speaker uses the sentences in the request form or the solicitation form to prompt hearers to do the action described by the sentence. Therefore, Constraint 4 (SUBJECT of sentence in the request form) A SUBJECT of a sentence in either the request form or the solicitation form is the hearer.</Paragraph> <Paragraph position="1"> The combination of this constraint and Constraint 3 (Persons) shows that the SUBJECT is the user in such a case. In example manuals, there are 123 sentences in the request form and all of them satisfy Constraint 4.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Modality expressions </SectionTitle> <Paragraph position="0"> Manual sentences may have a kind of modality expressing permission, ability, obligation, and so on.</Paragraph> <Paragraph position="1"> Sentences which have the expressions of ability or permission mean not only that it is possible for the SUBJECT to do the action, but also that the SUB-JECT has his/her own choice of whether to do the action or not. Therefore, Constraint 5 (SUBJECT of sentence with ability expressions) A SUBJECT of a sentence with the expressions of ability or permission must have his/her intention to make a choice about the action described by the sentence. null This constraint and Constraint 1 (Objects) show that a SUBJECT of a sentence with the expressions of ability or permission is a user, because all of the actions of manufacturer have been finished when the user is reading the manual. In example manuals, there are 56 sentences with the ability expressions and all of them satisfy Constraint 5.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 RU form </SectionTitle> <Paragraph position="0"> In Japanese, simple operation procedures are often described as simple sentences with no subjects whose verbs are of one of the following types: the RU form, the request form or the solicitation form. The RU form is the basic form of verbs and it denotes the non-past tense. Since the RU form has a neutral meaning, it does not impose any restriction on the SUBJECT. However, with Hypothesis 1 we expect that the zero subject tends to be a user.</Paragraph> <Paragraph position="1"> Default 1 (SUBJECT of sentence with a verb in the RU form) A SUBJECT of a sentence with a verb in the RU form is a user.</Paragraph> <Paragraph position="2"> In example manuals, there are 214 sentences with a verb in the RU form and with no subject, and the SUBJECTS of 172 sentences are users. Therefore, the precision of the default is about 80.4%.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Intransitives </SectionTitle> <Paragraph position="0"> In almost all cases of machines which come with instruction manuals, their actions are initiated by some activities of users. The activities are represented not by intransitives but by transitives. Therefore, we expect that a SUBJECT of a sentence with an intransitive tends to be a machine.</Paragraph> <Paragraph position="1"> Default 2 (SUBJECT of sentence with an intransitive) null A SUBJECT of a sentence with an intransitive is a machine.</Paragraph> <Paragraph position="2"> In example manuals, there are 238 sentences with intransitves, and the SUBJECTS of 211 sentences are machines. Therefore, the precision of the default is about 88.7%.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.5 Passives </SectionTitle> <Paragraph position="0"> The passivization is the transfer of the viewpoint of the speaker from the nominative to the objective by exchanging their positions. Namely, the passivization is used to bring the objective in the active voice to readers' attention, when SUBJECT is not so important for readers. Since readers, or users, do not have to know what SUBJECT is, it is hard for a SUB-JECT of a sentence in passive voice to be a user.</Paragraph> <Paragraph position="1"> Default 3 (SUBJECT of passives) A SUBJECT of a passive is a machine.</Paragraph> <Paragraph position="2"> In example manuals, there are 48 passives and the SUBJECTS of 46 sentences are machines. Therefore, the precision of the default rule is about 95.8%.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.6 Causatives </SectionTitle> <Paragraph position="0"> Since a causative expresses an event that the SUB-JECT of the causative makes someone(or something) do some action, the SUBJECT should have some intention and the initiative in controlling someone's action. Since a user has the initiative, we propose the following default.</Paragraph> <Paragraph position="1"> Default 4 (SUBJECT of causatives) A SUBJECT of a causative is a user.</Paragraph> <Paragraph position="2"> In example manuals, there are 38 passives and the SUBJECTS of 36 sentences are machines. Therefore, the precision of the default rule is about 94.7%.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.7 Expressions with the suffix -DESU </SectionTitle> <Paragraph position="0"> Expressions with the suffix -DESU are divided into two groups: * noun + the suffix of copula * Adjective verb Each of them expresses that a SUBJECT has some property. Since it is unusual to describe user's prop-erty in manuals. Therefore, Default 5 (SUBJECT of sentence with the suffix -DESU) A SUBJECT of a sentence with the suffix-DESU is a machine.</Paragraph> <Paragraph position="1"> In example manuals, there are 25 sentences with the expression, and all SUBJECT's of them are machines.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="10" type="metho"> <SectionTitle> 5 Constraints and Defaults based on </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="10" type="sub_section"> <SectionTitle> types of Connectives 5.1 Conditionals </SectionTitle> <Paragraph position="0"> Japanese has four conditional particles, TO, REBA, TARA and NARA, which are attached to the end of subordinate clauses as described in (1). The subordinate clause and the matrix clause conjoined by one of these particles correspond to the antecedent and the consequence, respectively. The difference of constraints of these expressions are shown in the following sentences, which are the variants of the sentence (3).</Paragraph> <Paragraph position="1"> out/go out.</Paragraph> <Paragraph position="2"> As well as the sentence (3), for Japanese native speakers, the SUBJECT of the matrix clause of (5) should be a machine. On the other hand, in the case of the sentences (6) and (7), the SUBJECTS of the matrix clauses can be either users or machines. These phenomena probably due to the nature of each conditionals(Masuoka, 1993). Since a causal relation, which is shown by TO or REBA, expresses a general rule, the consequence cannot include speaker's attitude, like volition and request. Therefore, the SUBJECT of the matrix clause should be a machine. In contrast, in the case of assumptions, that is TARA and NARA, there are no such restrictions on the SUB-JECT .</Paragraph> <Paragraph position="3"> Based oil these observation, Mort et al. (Mort and Nakagawa, 1995; Mort and Nakagawa, 1996) propose the defaults of SUBJECTS of sentences with these conditionals. Since it depends on the volitionality of the verb whether a sentence shows a speaker's attitude or not, the constraint and defaults are described in terms of volitionality of each verb. Note that the electronic dictionary IPAL provides the information of volitionality for each Japanese verb entry(IPA Technology center, 1987). According to the classification by IPAL, all of Japanese verbs are classified into two types, volitional verbs, which usually express intentional actions, and non-volitional verbs, which express non-intentional actions. Although non-volitional verbs only express non-volitional actions(non-volitional use), some of volitional verbs have not only volitional use but also non-volitional use.</Paragraph> <Paragraph position="4"> Default 6 (SUBJECT of sentence with TO or REBA) The matrix clause does not express user's volitional action. Therefore, the SUBJECT of the matrix clause is a machine, if the verb of the matrix clause does not have the non-volitional use.</Paragraph> <Paragraph position="5"> Default 7 (SUBJECT of sentence with TARA or NARA) The matrix clause expresses only user's volitional action. Therefore, the SUBJECT of the matrix clause is a user.</Paragraph> <Paragraph position="6"> The precision of the default rules of TO,REBA,TARA and NARA is 100%, 95.1%, 89.8% and 100%, respectively. null</Paragraph> </Section> <Section position="2" start_page="10" end_page="10" type="sub_section"> <SectionTitle> 5.2 Adverbial conjunctive forms </SectionTitle> <Paragraph position="0"> Japanese verbs have two major adverbial conjunctive forms: '-TE form' and 'adverbial form.' Roughly speaking, a clause with a verb in one of these forms is placed in front of another clause and they construct a coordinate relation. The following example shows the coordination of-TE form.</Paragraph> <Paragraph position="1"> C/o pushes the button and Cp takes out Cq.</Paragraph> <Paragraph position="2"> According to Teramura(Teramura, 1991), essentially these forms of verbs express the coordination and cooccurrence of two events. For example, tile most plausible interpretation of (8) is that C/o and Cp are identical. Thus it is expected that two SUBJECTS of two clause in the coordination are identical or of the same type. Especially in manuals, the writer does not describe user's actions in the same treatment as machine's action, because the writer takes the viewpoint of users as supposed in Hypothesis 1.</Paragraph> <Paragraph position="3"> Therefore, Default 8 (Two SUBJECTS of clauses in TE form conjunction or adverbial form conjunction) null Two SUBJECTS of two clauses are identical when the two clauses are connected by the TE form conjunction or the adverbial form conjunction.</Paragraph> <Paragraph position="4"> In example manuals, there are 83 sentences with TE form conjunction and 75 sentences meets the default. Thus the precision of the default for sentences with TE form conjunction is about 90.4%. Similarly, there are 99 sentences with adverbial form conjunction and 98 sentences complies the default. The precision of the default for adverbial form conjunction is about 99.0%. Moreover, in the majority of the cases of TE form conjunctions, SUBJECT is a user (85 cases). Therefore we revise the default for adverbial form conjunction as follows.</Paragraph> <Paragraph position="5"> Default 9 (Two SUBJECTS of clauses in TE form conjunction) Each SUBJECTS of two clauses is a user when the clauses are connected by the TE form conjunction.</Paragraph> </Section> </Section> class="xml-element"></Paper>