File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/j94-2003_abstr.xml
Size: 12,221 bytes
Last Modified: 2025-10-06 13:48:16
<?xml version="1.0" standalone="yes"?> <Paper uid="J94-2003"> <Title>Japanese Discourse and the Process of Centering</Title> <Section position="2" start_page="0" end_page="196" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="194" type="sub_section"> <SectionTitle> 1.1 Centering in Japanese Discourse </SectionTitle> <Paragraph position="0"> Recently there has been an increasing amount of work in computational linguistics involving the interpretation of anaphoric elements in Japanese (Yoshimoto 1988; Kuno 1989; Walker, Iida, and Cote 1990; Nakagawa 1992). These accounts are intended as components of computational systems for machine translation between Japanese and English or for natural language processing in Japanese alone. This paper has three aims: (1) to generalize a computational account of the discourse process called CENTERING (Sidner 1979; Joshi and Weinstein 1981; Grosz, Joshi, and Weinstein 1983; Grosz, Joshi, and Weinstein unpublished), (2) to apply this account to discourse processing @ 1994 Association for Computational Linguistics Computational Linguistics Volume 20, Number 2 in Japanese so that it can be used in computational systems, and (3) to provide some insights on the effect of syntactic factors in Japanese on discourse interpretation. In the computational literature, there are two foci for research on the interpretation of anaphoric elements such as pronouns. The first viewpoint focuses on an inferential process driven by the underlying semantics and relations in the domain (Hobbs 1985a; Hobbs et al. 1987; Hobbs and Martin 1987). A polar focus is to concentrate on the role of syntactic information such as what was previously the topic or subject (Hobbs 1976b; Kameyama 1985; Yoshimoto 1988). We will argue for an intermediate position with respect to the interpretation of ZEROS, unexpressed arguments of the verb, in Japanese. Our position is that the interpretation of zeros is an inferential process, but that syntactic information provides constraints on this inferential process (Joshi and Kuhn 1979; Joshi and Weinstein 1981). We will argue that syntactic cues and semantic interpretation are mutually constraining (Prince 1981b, 1985; Hudson-D'Zmura 1988).</Paragraph> <Paragraph position="1"> The syntactic cues in Japanese discourse that we investigate are the morphological markers for grammatical TOPIC, the postposition wa, as well as those for grammatical functions such as SUBJECT, ga, OBJECT, o, and OBJECT2, ni. In addition, we investigate the role of speaker's EMPATHY, which is the viewpoint from which an event is described.</Paragraph> <Paragraph position="2"> This can be syntactically indicated through the use of verbal compounding, i.e. the auxiliary use of verbs such as kureta, kita.</Paragraph> <Paragraph position="3"> In addition to the argument that a purely inference-based account does not consider limits on processing time, another argument against a purely inference-based account is provided by the minimal pair below. Here, the only difference is whether Ziroo is the subject or the object in the second utterance. Note that the interpretation of zeros is indicated in parentheses: Marilyn Walker et al. Japanese Discourse means Ziroo asked Taroo the score of yesterday's game, while 2c means Taroo asked Ziroo the score of yesterday's game. On the other hand, some purely syntactic accounts require that antecedents for zeros be realized as the grammatical TOPIC, and thus cannot explain the above example because Taroo is never explicitly marked as the topic (Yoshimoto 1988).</Paragraph> <Paragraph position="4"> In the literature, ZEROS are known as zero pronouns. We adopt the assumption of earlier work that the interpretation of zeros in Japanese is analogous to the interpretation of overt pronouns in other languages (Kuroda 1965; Martin 1976; Kameyama 1985). Japanese also has overt pronouns, but the use of the overt pronoun is rare in normal speech, and is limited even in written text. This is mainly because overt pronouns like kare ('he') and kanozyo ('she') were introduced into Japanese in order to translate gender-insistent pronouns in foreign languages (Martin 1976). In this paper, we only consider zeros in subcategorized-for argument positions. Since Japanese doesn't have subject or object verb agreement, there is no syntactic indication that a zero is present in an utterance other than information from subcategorization) First, in Section 1.2 we describe the methodology that we applied in this investigation. In Section 2, we present the theory of centering and some illustrative examples. Then, in Section 3, we discuss particular aspects of Japanese discourse context, namely grammatical TOPIC and speaker's EMPATHY. We will show how these can easily be incorporated into a centering account of Japanese discourse processing, and give a number of examples to illustrate the predictions of the theory. We also discuss the way in which a discourse center is instantiated in Section 4.</Paragraph> <Paragraph position="5"> In Section 5 we propose a discourse rule of ZERO TOPIC ASSIGNMENT, and use the centering model to formalize constraints on when a zero may be interpreted as a ZERO TOPIC. Our account makes a distinction between two notions of TOPIC--grammatical topic and zero topic. The grammatical topic is the wa-marked entity, which is by default predicted to be the most salient discourse entity in the following discourse. However, there are cases in which it may not be, depending on whether ZERO TOPIC ASSIGNMENT applies. This analysis provides support for Shibatani's claim that the interpretation of the topic marker, wa, depends on the discourse context (Shibatani 1990). ZERO TOPIC ASSIGNMENT actually predicts ambiguities in Japanese discourse interpretation and provides a mechanism for deriving interpretations that previous accounts claim would be unavailable.</Paragraph> <Paragraph position="6"> We delay the review of related research to Section 6 when we can contrast it with our account. The two major previous accounts are those of Kuno (Kuno 1972, 1976b, 1987, 1989) and Kameyama (Kameyama 1985, 1986, 1988). Finally, in Section 7, we summarize our results and suggest topics for future research.</Paragraph> </Section> <Section position="2" start_page="194" end_page="195" type="sub_section"> <SectionTitle> 1.2 Methodology </SectionTitle> <Paragraph position="0"> Most of the examples in this paper are constructed as four utterance discourses that fit one of a number of structural paradigms. In all of the paradigms, a discourse entity is 1 When zero pronouns should be stipulated is still a research issue. For example, Hasegawa (1984) described a zero pronoun as a phonetically null element in an argument position. However, as shown in the following example, Terazu, Yamanasi, and Inada (1980) assumed that zero pronouns are not limited in their distribution and stipulated them in adjunct positions as well (Iida 1993).</Paragraph> <Paragraph position="1"> Taroo wa Hanako no kaban o mitukemasita.</Paragraph> <Paragraph position="2"> Taroo TOP/SUBJ Hanako GEN bag OBJ found Taroo found Hanako&quot; s bag.</Paragraph> <Paragraph position="3"> 0 0 tanzyoobi no purezento o irernasita.</Paragraph> <Paragraph position="4"> birthday GEN present OBJ put (Taroo) put a birthday present (in her bag).</Paragraph> <Paragraph position="5"> Computational Linguistics Volume 20, Number 2 introduced in the first utterance, and established by the second utterance as the CENTER, what the discourse is about. The manipulations of context occur with the third and the fourth utterances. In each case the zero in the third utterance cospecifies the entity already established as the center in the second utterance. The fourth utterance consists of a potentially ambiguous sentence containing two zeros. The variations in context are as shown below:</Paragraph> </Section> <Section position="3" start_page="195" end_page="196" type="sub_section"> <SectionTitle> Third Utterance Fourth Utterance </SectionTitle> <Paragraph position="0"> SUBJECT OBJECT(2) SUBJECT OBJECT(2) EXAMPLES zero NP(o or ni) zero zero 5 zero NP(o or ni) zero zero, empathy 36 NP(ga) zero zero zero 32, 34 NP(wa) zero zero zero 4, 33 NP(ga) zero zero zero, empathy 35 Thus we are manipulating factors such as whether a discourse entity is realized in subject or object position in the third utterance, whether a discourse entity realized in subject position is ga-marked or wa-marked in the third utterance, and whether a discourse entity realized in the fourth utterance in object position is marked as the locus of speaker's EMPATHY.</Paragraph> <Paragraph position="1"> We collected a group of about 35 native speakers by solicitation on the InterNet to provide judgments for most of the examples given in this paper. These native speakers were readers of the newsgroups sci.lang.japanese and comp.research.japan. They were thus typically well-educated, bilingual engineers. Whenever an example was tested in this way, we provide the number of informants who chose each possible interpretation to the right of the example. Some examples that are included for expository reasons were not tested.</Paragraph> <Paragraph position="2"> Participation in our survey was completely voluntary, and the data were collected over three surveys. Thus the numbers of subjects varied from one survey to another, and this is reflected in the numbers accompanying our examples. This data collection was carried out on written examples using electronic mail in a situation in which the informants could take as long as they wanted to decide which interpretation they preferred. The instructions sent with the surveys are given in Appendix A. This paradigm clearly cannot provide information on which interpretation a sub-ject might arrive at first and then perhaps change based on other pragmatic factors, and thus it contrasts with reaction time studies. However, the judgments given should be stable and should reflect the fact that our informants were able to use all the information in the discourse. It is a useful paradigm given that we are exploring the correlation of syntactic cues and discourse interpretation. It has been claimed that syntactic cues are only used in automatic processing and can be overridden by deeper processing. However, Hudson's results suggest that subjects may judge a discourse sequence to be nonsensical when it is incoherent according to centering (Hudson-D'Zmura 1988). Di Eugenio claims that discourse sequences in Italian that are not discourse-coherent according to centering theory produce a garden-path effect (Di Eugenio 1990). The methods we used allow us to explore the results of these interactions, and yet it Marilyn Walker et al. Japanese Discourse would be beneficial for these results to be expanded upon by careful psychological experimentation.</Paragraph> <Paragraph position="3"> For most of the examples reported here, we asked subjects to choose one preferred interpretation instead of allowing them to rank interpretations. The motivation for doing this was to force differences to come out for slight preferences, with the theory being that other variations would come out across subjects. In a few cases we allowed subjects to indicate no preference; these examples will be clearly indicated.</Paragraph> <Paragraph position="4"> In addition, we used the same gender for multiple discourse entities to prevent any tendency for judgments to be influenced by gender stereotypes. We also avoided using verbs with causal biases toward one of their arguments, and we used few cue words such as but, because, and then, which could result in a bias toward, say, a cause-effect or temporal sequence of events interpretation. We also omitted honorific markers, which are normally a part of Japanese ambiguity resolution. 2 This was done to isolate the effects of the variables that we were exploring in this study, namely topic marking, grammatical function, empathy, and realization with a zero or with a full noun phrase.</Paragraph> </Section> </Section> class="xml-element"></Paper>