File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1085_metho.xml

Size: 13,399 bytes

Last Modified: 2025-10-06 14:14:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1085">
  <Title>Definiteness Predictions for Japanese Noun Phrases*</Title>
  <Section position="4" start_page="519" end_page="521" type="metho">
    <SectionTitle>
3 The Rule Hierarchy
</SectionTitle>
    <Paragraph position="0"> The rule hierarchy we introduce in this paper has been devised from a systematic survey of some data from a Japanese corpus consisting of appointment scheduling dialogues3 Since dialogues in this domain tend to be short, on average consisting of just 14 utterances, most definite references have to be introduced by way of accommodation rather than referring back to the discourse context. Moreover, references to events have a particular tendency to be nonspecific, i.e. stating their existence rather than explicating their identity. Non-specific references are by definition indefinite, whether the referent has been previously introduced to the context or not.</Paragraph>
    <Paragraph position="1"> Neither accommodation nor non-specific reference can be realized without linguistic indicators, since they would otherwise interfere with the context-based distinction between definite and indefinite reference within a discourse. The appointment scheduling domain is therefore ideal for a case study aimed at extracting linguistic indicators for definiteness.</Paragraph>
    <Section position="1" start_page="519" end_page="520" type="sub_section">
      <SectionTitle>
3.1 Overview
</SectionTitle>
      <Paragraph position="0"> Explicit marking for definiteness takes place on several syntactic levels, namely on the noun itself, within the noun phrase, through counting expressions, or on the sentence level. For each of these syntactic levels, a set of rules can be defined by generalizing over the linguistic indicators that are responsible for the definiteness attributes carried by the noun phrases in the corpus. Each of these rules consists of one or more preconditions, and a consequent that assigns the associated definiteness attribute to the respective noun phrase when the preconditions are met.</Paragraph>
      <Paragraph position="1"> As it turns out, none of the rules defined on the same syntactic level interfere with each other, since they either assign the same value, or their preconditions cannot possibly be met at the same time. Thus the rules can be grouped together into classes corresponding to the four 1In this survey, all the noun phrases from 10 dialogues were analyzed in detail, determining the regularities that led to definiteness predictions. These were then formulated into a set of rules and arranged in a hierarchical manner to rule out wrong predictions. A more detailed description of the methods used and a full list of the rules can be found in (Heine, 1997).</Paragraph>
      <Paragraph position="2">  syntactic levels they are defined on. There is a clear hierachy between the four classes, with all rules of one class given priority over all rules on a lower level, as shown in figure 1. Note that even though the rule classes are defined in terms of syntactic levels, the sequence of rule classes in our hierarchy does not correspond in any way to syntactic structure.</Paragraph>
    </Section>
    <Section position="2" start_page="520" end_page="521" type="sub_section">
      <SectionTitle>
3.2 Noun rules
</SectionTitle>
      <Paragraph position="0"> On the noun level, the lexical properties of the noun or one of its direct modifiers can determine the reference of the noun in question.</Paragraph>
      <Paragraph position="1"> There are a number of nouns, that can be marked as definite on their lexical properties alone, either because they refer to a unique referent in the universe of discourse, or because they carry some sort of indexical implications.</Paragraph>
      <Paragraph position="2"> The referent is thus described uniquely with respect to some implicitly mentioned context.</Paragraph>
      <Paragraph position="3"> For example, there exist a number of nouns that implicitly relate the referent with either the hearer or the speaker, depending on the presence or absence of honorifics 2, respectively. In the appointment scheduling domain, the most frequently used words of this class are (go)yotei (your/my schedule), (o)kangae (your/my opinion) and (go)tsugoo (for you/me).</Paragraph>
      <Paragraph position="4"> Indexical time expressions like konshuu (this week) or raigatsu (next month) refer to a specific period of time that stands in a certain relation to the time of utterance. Even though they do not necessarily have to stand with an article in the target language, the reference is still definite, as in the following example: (1) raishuu desu ne next week to be isn't it 'That is (the) next week, isn't it?' The interpretation of a modified noun is typically restricted to a specific referent by the modification, thus making it definite in reference. Restrictive modifiers of this type are, for example, specifiers like demonstratives and possessives, as well as time expressions and attributive relative clauses, as shown in the following examples.</Paragraph>
      <Paragraph position="5"> (2) tooka no shuu desu tenth GEN week to be 'That is the week of the tenth.' (3) nijuurokunichi kara hajimaru twentysixth from to begin shuu wa ikaga deshoo ka week TOPIC how to be QUESTION 2In Japanese, there are two honorific prefixes, go and o, that can be used to politely refer to things related to the hearer. However, there are no such prefixes to humbly refer to things relating to oneself.</Paragraph>
      <Paragraph position="6">  'How is the week beginning the 26th?' However, indefinite pronouns, as for example hoka (another), also fall into the category of modifiers, but explicitly assign indefinite reference to the noun they modify. These are usually used to introduce a new referent into a context already containing one or more referents of the same type.</Paragraph>
      <Paragraph position="7"> (4) hoka no hi erabashite itadaite mo different day choose receive also ii n desu ga good DISCREL 'Could I ask you to choose a different day?' At present, there are nine rules belonging to the noun class, only one of which assigns indefinite reference whilst all others assign definite reference to the noun in question.</Paragraph>
    </Section>
    <Section position="3" start_page="521" end_page="521" type="sub_section">
      <SectionTitle>
3.3 Clausal rules
</SectionTitle>
      <Paragraph position="0"> On the sentence level, verbs may carry strong preferences for the definiteness of one or more of their arguments, somewhat in the way of domain specific patterns. Generally, these patterns serve to specify whether a complement to a certain verb is more likely to be definite or indefinite in a semantically unmarked interpretation. For example, in a sentence like 5, kaigi ga haitte orimasu corresponds to the pattern 'EVENT ga hairu' ('have an EVENT scheduled'), where the scheduled event denoted by EVENT is indefinite for the unmarked reading.</Paragraph>
      <Paragraph position="1"> (5) kayoobi wa gogo sanji made Tuesday TOPIC pm 3 o'clock until kaigi ga haitte orimasu node meeting NOM have scheduled since 'since I have a meeting scheduled until 3 pm on Tuesday' On the other hand, in sentence 6, kaigi ga owarimasu is an instance of the pattern 'EVENT ga owaru' ('the EVENT will end'), where, in the unmarked reading, the event that ends is presupposed to be a specific entity, whether it is previously known or not.</Paragraph>
      <Paragraph position="2">  (6) juuniji ni kaigi ga 12 o'clock at meeting NOM owarimasu node to end since  'since the meeting will end at 12 o'clock' The object of an existential question or a negation is by default indefinite, since these sentence types usually indicate the (non)existence of the noun in question. Thus, for example, in the two sentence patterns 'x wa arimasu ka' ('Is there an x?') and 'x wa arimasen' ('There is no x.') the object instantiating x is indefinite, unless marked otherwise.</Paragraph>
      <Paragraph position="3"> In addition to these sentence patterns, there are a number of nouns that can be followed by the copula suru to form a light verb construction. These constructions usually come without a particle and are treated as compound verbs, as for example uchiawase suru ('to arrange'). However, these nouns can also occur with the particle o, as in uchiawase o suru, introducing an ambiguity whether this expression should be treated as a light verb construction or as a normal verb complement structure. Since this ambiguity can best be resolved at some later point, the noun should be marked as being indefinite, irrespective of whether it will eventually be generated as a noun or a verb in the target language. null (7) raishuu ikoo de next week from.., onwards uchiawase o shitai arrangement ACC want to make n desu ga</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="521" end_page="522" type="metho">
    <SectionTitle>
DISCREL
</SectionTitle>
    <Paragraph position="0"> 'I would like to make an arrangement from next week onwards' To override any of these default values, the noun will have to be explicitly marked, using any of the markers on the noun level. Thus we take the clausal rules to be between the top level noun rules and all other rules further down the hierarchy.</Paragraph>
    <Paragraph position="1"> From the appointment scheduling domain, eight sentence patterns were extracted, where six assign the default indefinite and two indicate definite reference. Thus, together with the  light verb constructions, there are nine rules in this class.</Paragraph>
    <Section position="1" start_page="522" end_page="522" type="sub_section">
      <SectionTitle>
3.4 Noun phrase rules
</SectionTitle>
      <Paragraph position="0"> The postpositional particles that complete a noun phrase in Japanese serve primarily as case markers, but can also influence the interpretation of the noun with respect to definiteness. However, the definiteness predictions triggered by the use of particles can be fairly weak and are easily overridden by other factors, thus placing the rules emerging from these patterns near the bottom of the hierarchy.</Paragraph>
      <Paragraph position="1"> The main postpositions indicating definite reference are the topicalization particle wa in its non-contrastive use s, the boundary markers kara (from) and made (to) and the genitive marker no, especially in conjunction with hoo (side), as indicated by the following examples.</Paragraph>
      <Paragraph position="2"> (s) chotto idoo no jikan unfortunately transfer GEN time ga torenaiyoo desu ne NOM take not DISCREL 'Unfortunately, there is no time for the  All of the four noun phrase rules in the current framework indicate definite reference.</Paragraph>
    </Section>
    <Section position="2" start_page="522" end_page="522" type="sub_section">
      <SectionTitle>
3.5 Counting expressions
</SectionTitle>
      <Paragraph position="0"> As it turns out, there is one more level to the rule hierarchy. Even though counting expressions are semantically modifiers, they do not syntactically modify the noun itself but rather the entire noun phrase. They do not have to be adjacent to the noun phrase they modify, since they are marked by a counting suffix indicating the type of objects counted.</Paragraph>
      <Paragraph position="1"> ~This means, that definite reference is indicated by the main use of the particle wa, namely as a topic marker, stressing the discourse referent the conversation is about. There is another, contrastive use of wa, which introduces something in contrast to another discourse referent. Naturally, this use may introduce a related, albeit previously unknown -- and thus indefinite -- referent.</Paragraph>
      <Paragraph position="2"> (10) nijuuhachinichi g a gogo ni twentyeighth NOM afternoon in kaigi ga ikken haitte orimasu meeting ACC one be scheduled 'There is one/a meeting scheduled on the twentyeighth.' Semantically, counting expressions imply the existence of a certain number of the objects counted, in the same way that the indefinite article does. These expressions are therefore taken to be indefinite by default, but can be made definite by any of the other rules. Counting expressions thus make up a class of their own on the lowest level of the hierarchy.</Paragraph>
    </Section>
    <Section position="3" start_page="522" end_page="522" type="sub_section">
      <SectionTitle>
3.6 Underspecified values
</SectionTitle>
      <Paragraph position="0"> As might be expected from the concept of preprocessing, there will be a number of noun phrases that cannot be assigned a definiteness attribute by any of the rules described above.</Paragraph>
      <Paragraph position="1"> These will remain underspecified for definiteness until an antecedent can be found for them by the context checking mechanism, or until they are assigned a default value.</Paragraph>
      <Paragraph position="2"> By introducing a value for underspecification, it is possible to postpone the decision whether a noun phrase should be marked definite or indefinite, without losing the information that it must be marked eventually. Since default values are only introduced when a value is still under-specified after the assignment mechanism has finished, there is no need to ever change a value once it has been assigned. This means, that the algorithm can work in a strictly monotone manner, terminating as soon as a value has been found.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML