File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1045_metho.xml

Size: 24,965 bytes

Last Modified: 2025-10-06 14:14:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1045">
  <Title>Automatic Extraction of Aspectual Information from a Monolingual Corpus</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ACC weax-PRES
</SectionTitle>
    <Paragraph position="0"> 'Ken has been wearing that kimono since this morning.' null (e). Ken-wa ano kimono-wo san-nen maeni ki-te-i-ru. Ken-TOP that kimono-ACC three-year before wear-PRES 'Ken has the experience of wearing that kimono three years ago.' Notice that English translations use separate lexical items (put on for (a) and wear for (b), (c)) and different aspectual configurations (the progressive for (a), the perfect progressive for (b), and another for (c)), while all Japanese sentences contain the same verbal form ki-te-i-ru. Thus. when the system tries to translate these sentences, it must be aware of the difference among them.</Paragraph>
    <Paragraph position="1"> This paper describes an approach to extract the aspectual information of Japanese verb phrases from a monolingual corpus. In the next section, we will classify Japanese verbs into six categories by means of aspectual features following the framework of (Bennett et al., 1990). The aspectual forms land adverbs are defined as the functions which operate on verbs' aspectual features and changes their values. By using the constraints of the applicability of the functions, we can identify a unique category for each verb automatically. If one can acquire aspectual properties of verbs properly and know how the other constituents in a sentence operate on them, then the aspectual meaning of the whole sentence will be determined monotonically. To evaluate the result of the experiment, we will examine the meaning of -teiru which is one of the most fundamental aspectual forms, since the classification itself is difficult to evaluate objectively.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="353" type="metho">
    <SectionTitle>
2 Realization Process of Aspectual
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="352" type="sub_section">
      <SectionTitle>
Meaning
</SectionTitle>
      <Paragraph position="0"> We consider that the whole aspectual meaning of verb phrases is determined in the following order: verbs ---, arguments ~ adverbs ~ aspectual forms, Adverbs and aspectual forms are defined as indicators of such cognitive processes as &amp;quot;zooming&amp;quot; and &amp;quot;focusing&amp;quot; which operate on the time-line representation. They are sinfilar to the notions &amp;quot;aspectual coercion&amp;quot; (Moens and Steedman, 1988) or I The term &amp;quot;form&amp;quot; refers to grammatical morphemes which axe defined in terms of derivation. In this paper, we refer to the aspectual morphemes which follow verbs as &amp;quot;aspectual forms&amp;quot;, including compound verbs such as .hazimevu(begin), suffixes with epenthetic -re such as teiru, and aspectual nominals such as -bakaviOust now) etc.</Paragraph>
      <Paragraph position="1">  &amp;quot;views&amp;quot; (Gunji, 1992). We explain each in turn.</Paragraph>
    </Section>
    <Section position="2" start_page="352" end_page="352" type="sub_section">
      <SectionTitle>
2.1 Aspectual Categories of Verbs
</SectionTitle>
      <Paragraph position="0"> A number of aspectually oriented lexical-semantic representations have been proposed. ~Ve adopt and extend the feature-based framework proposed by (Bennett et al., 1990) in the spirit of (Moens and Steedman, 1988). They uses three features: +-dynamic, +-telic, and +-atomic. We add two more features: +-process and +-gradual.</Paragraph>
      <Paragraph position="1"> The feature dynamicity distinguishes between states(-d) and events(+d), and atomicity distinguishes between point events(+a) and extended events(-a). The duration described by verbs is twofold: an ongoing process and a consequent state.</Paragraph>
      <Paragraph position="2"> The feature process concerns an ongoing process and distinguishes whether events described by verbs have the duration for which some actions unfold.</Paragraph>
      <Paragraph position="3"> The feature telicity distinguishes between culminative events(+t) and nonculminative events(-t). It presupposes a process. The feature graduality characterizes events in which some kind of change is included and the change gradually develops.</Paragraph>
      <Paragraph position="4"> We can classify verbs by means of different combinations of the five features. Since there are dependences between features, only subsets of the combinatorially possible configurations of features are defined as shown in the Table 1.</Paragraph>
      <Paragraph position="5"> In the Table 1, 1.stative verbs are those that are not dynamic. 2.atomic verbs are those that express an atomic event. 3.resultative verbs express a punctual event followed by a new state which holds over some interval of time. 4.process+result verbs are those that express a complex situation consisting of a process which culminates in a new state. 5.non-gradual process verbs are those that express only processes and not changes of state. 6.gradual process verbs are those that have graduality. Although the verbs of the categories 5 and 6 don't contain telicity, the arguments of the verbs or some kinds of adverbs can set up the endpoint of the process as discussed later. In Vendlerian classification, states correspond to 1, achievements to 2 and 3, accomplishments to 4 and 6, activities to 5, respectively (Vendler, 1957).</Paragraph>
    </Section>
    <Section position="3" start_page="352" end_page="352" type="sub_section">
      <SectionTitle>
2.2 Arguments
</SectionTitle>
      <Paragraph position="0"> Tenny points out that internal argument of a verb can be defined as that which temporally delimits or measures out the event (Tenny, 1994).</Paragraph>
      <Paragraph position="1"> The direct internal argument can aspectually * 'measure out the event&amp;quot; to which the verb refers. To clarify what is meant by &amp;quot;'mesuring-out&amp;quot;, she gives examples of three kinds of measuring-out: incremental theme verbs (eat an apple, build a house etc.), change-of-state verbs (ripen the fruit etc.) and path objects of route verbs (climbed the ladder, play a sonata etc.).</Paragraph>
      <Paragraph position="2"> On the other hand, the indirect internal argument can provide a temporal terminus for the event described by the verb. The terminus causes the event to be delimited as in push the car to a gas station.</Paragraph>
      <Paragraph position="3"> There is only one kind of internal argument, in terms of thematic roles, that does provide an event terminus, and that is a goal.</Paragraph>
      <Paragraph position="4"> In terms of the current framework, both of them add the telicity to the verb which does not inherently contain the telicity. They play a role of framing the interval on which the focus should be brought.</Paragraph>
    </Section>
    <Section position="4" start_page="352" end_page="353" type="sub_section">
      <SectionTitle>
2.3 Adverbs
</SectionTitle>
      <Paragraph position="0"> In general, adverbs focus on the subpart of the event described by a verb and give a more detailed description. According to the discussion in (Moriyama, 1988), adverbs can be classified as follows in terms of the subpart on which they focus.</Paragraph>
      <Paragraph position="1"> Processes modifiers modify verbs which have process (+p). This class includes reduplicative onomatopoeia such as gasagasa, batabata, suisui, sesseto, butubutu, etc., which are expressing sound or manner of directed motion, and rate adverbs such as yukkuri(slowly), tebayaku(quickly), etc., which express the speed of motions. They focus on the on-going process of events described by verbs.</Paragraph>
      <Paragraph position="2"> Gradual change indicators express the progress of change of state, such as dandan (gradually), sukosizutu (little by little), jojom (gradually), dondon (constantly). sidaini (by degrees), etc.. which modify gradual process verbs (+g) and focus on the process.</Paragraph>
      <Paragraph position="3"> Continuous adverbs are those that can modify both states verbs (-d) and process verbs (+p), such as zutto(for a long time), itumademo(forever), etc. They express a continuance of an event or a maintenance of a state.</Paragraph>
      <Paragraph position="4">  aru(be), sobieru( se), sonzaisuru( e=isO hirameku(flash), mikakeru(notice) suwaru(sit down), tatu(stand up) korosu(kill), Urn(put on~wear), ake,' (open) aruku(walk), in(say), utau(sing) kusaru(turn sour), takamaru(become high)  Atomic adverbs make any events instantaneous, such as satto, ponto, gatatto, potarito, syunkan, etc., which express instantaneous sound emission or an instant. When these adverbs co-occur with verbs, the events are understood as instantaneous. This doesn't necessarily imply that the verb itself is instantaneous. null Quantity regulators measure out events, such as gokiro aruku(walk 5kin). gojikan seizasita(sit straight for 5 hours), etc. These include time, distance, and any quantity of contents.</Paragraph>
      <Paragraph position="5"> End state modifiers express the consequent state of events, such as mapputatuni(into two exact halves), konagonani(into pieces), pechankoni(be fiat), barabarani(come apart), etc. They focus on the resultant state.</Paragraph>
      <Paragraph position="6"> So far we have described adverbs which concern a single event, but some adverbs regulate the multiple events which involves iteration of a single event. By iteration, the whole process of a collective event can be taken up regardless of the inherent features of verbs.</Paragraph>
      <Paragraph position="7"> There are two kinds of Repetition adverbs: one regulates the whole quantity of the iteration of events such as san-kai(three times) or nandomo(many times) etc., and the other describes the habitual repetition of events such as itumo(always) or syottyuu(very often) etc. Both describe many events each of which involves one person's act.</Paragraph>
      <Paragraph position="8"> Finally, we shall mention Time in the past adverbs. There are cases where the form -teiru, which marks the present tense, can co-occur with temporal adverbs describing the past. (See the exan~ple (lc) in the introduction.) It describes the experiential fact of an event. Such adverbs as katute(once), mukasi(in the past) and izen(be/ore) determine the temporal structure of the event related with tense.</Paragraph>
    </Section>
    <Section position="5" start_page="353" end_page="353" type="sub_section">
      <SectionTitle>
2.4 Aspectual Forms
</SectionTitle>
      <Paragraph position="0"> The ability of aspectual forms to follow verbs is constrained by the inherent features of verbs. We briefly describe some of aspectual forms used in the experiment. null The forms -you-to-suru(be going to) and kakeru(be about to) take up the occurrence of events. They can follow the verbs which are dynamic(+d).</Paragraph>
      <Paragraph position="1"> The form -tuzukeru(continue)can follow the verbs which have duration(-a). It can take up either the ongoing process or the resultant state. The form hajimeru(begin) can follow the verbs which have process(-bp) and takes up the start time of the process. On the other hand, the forms -owaru(cease) and oeru(finish) can follow the verbs which are telic(+t) and takes up the end point of the process. However, these constraints on the inherent features of verbs are only concerned with a single event. By iteration, the whole process of a collective event can be taken up regardless of the inherent features of verbs, as mentioned above.</Paragraph>
      <Paragraph position="2"> The forms -tutuaru(be in progress), -tekuru(come into state) and -teiku(go into state) focus on the gradual process of change. -Tutuaru(be in progress) takes up it as a kind of state, -tekuru(come into state) views it from the end state of change while -teiku(go into state) from the initial state of change. Both of -tekuru and -teiku have usages other than aspect, as in mot-tekuru(bring) or mot-teiku(take).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="353" end_page="357" type="metho">
    <SectionTitle>
3 Experiment
</SectionTitle>
    <Paragraph position="0"> We carried out an experiment to classify Japanese verbs into six categories in the Table 1 by means of corpus data.</Paragraph>
    <Paragraph position="1"> As shown in the Figure 1, each category is defined in terms of the ability to co-occur with aspectual forms. However, the discrimination of the categories needs negative evidence which we cannot use by definition. A corpus only provides positive evidence. Furthermore, some forms can be used regardless of the features and have usages other than aspect as discussed in the previous section. ~Ve must establish a method which takes into these facts into account.</Paragraph>
    <Paragraph position="2">  \oo -2. atomic verbs +~p --/ \ +t //~ . t 3. resultative verb= =deg/ \ process+result /~ 4&amp;quot; verbs +g/ ~-g 4C/i1~ 6. gr=dual process 5. non-gradual process verbs verbs</Paragraph>
    <Section position="1" start_page="353" end_page="356" type="sub_section">
      <SectionTitle>
3.1 Algorithm
</SectionTitle>
      <Paragraph position="0"> We used the EDR Japanese Corpus and the EDR Japanese Co-occurrence Dictionary (EDR, 1995) as material to extract syntactic clues in the experiment. The corpus contains 220,000 sentences from various genres of text. The results of the parsing analysis of these sentences indicates that the constituents of the sentence have a dependency struc- null STEP:I Pick out the items of which the governing and dependent words are a verb and an adverb from the EDR Co-occurrence Dictionary and store them with the frequency in an array called PAIRS (cf.</Paragraph>
      <Paragraph position="1"> Table 2).</Paragraph>
      <Paragraph position="2"> STEP:2 For each adverb in PAIRS, give an adverb class label (the initial letter of the class name) on the basis of the discussion in sec. 2.3 and store them in an array called ADVERBS (cf. Table3 and Table4).</Paragraph>
      <Paragraph position="3"> STEP:3 For each verb in PAIRS, add up the frequency of the co-occurrence with the adverbs contained in the array ADVERBS. If the sum is greater than 4, store the verb in a list called VERBS.</Paragraph>
      <Paragraph position="4"> STEP:4 For each sentence in the corpus, find a verb and if it is contained in VERBS, then: STEP:4degI If the form following the verb is contained in the predefined list (Table5), make an array FORMS\[/,j\] positive (where i is the position of the verb in the list VERBS and j is the position of the form in the Table 5, see Table6), provided that the verb is not modified by repetition adverbs(R). When the form is -tekuru or -teiku, put it on record only if the verb is modified by gradual change indicators(G). null STEP:4-2 If the verb is modified by the adverbs contained in the array ADVERBS, refer to the adverb class label and add 1 to an array MODIFIED\[i, k\] (where i is the position of the verb in the list VERBS and k is the position of the adverb class label in the Table4. When the adverb is continuous one(C), distinguish the cases where the verb is followed by -teiru(C1) from the other eases(C2), see Table7), provided that the verb is not followed by negative forms such as -nai or -zu, nor the forms which change the voice such as -reru(the passivizer) or -seru(the causativizer), since they affect the aspeetual properties of the attached verb.</Paragraph>
      <Paragraph position="5"> STEP:5 For each verb in VERBS: STEP:5-1 Narrow down the candidates by means of the array FORMS (on the basis of possible categories shown in Table 5).</Paragraph>
      <Paragraph position="6"> STEP:5-2 In the ease where the category of the verb cannot be uniquely identified in STEP:51, i.e., other than the category 6, determine it by means of the array MODIFIED as follows:  ture. That is. the constituents have a governingdependent relation. It is these constituents that form the head phrases of the Japanese Co-occurrence Dictionary which describes collocational information in the form of binary relations. Each item in the Japanese Co-occurrence Dictionary consists of a governing word. a dependent word, the relator between the words, and supplementary co-occurrence item information which is composed of the frequency of the co-occurrence relation and a portion of the actual example sentence from which the co-occurrence relation was taken.</Paragraph>
      <Paragraph position="7"> The algorithm used for classifying verbs is shown in Figure 2.</Paragraph>
      <Paragraph position="9"> (if the verb is modified by gradual change indicators(G)) (if modified by process modifiers(P) and not by end state modifiers(E)) (if modified by both process modifiers(P) and end state modifiers(E)) (if modified by end state modifiers(E) and not byprocess modifiers(P)) (if modified by only atomic adverbs(A)) (if modified by continuous adverbs without being followed</Paragraph>
      <Paragraph position="11"/>
      <Paragraph position="13"> The steps 1, 2 find 3 are the processes to determine the target verbs. There are 431 verbs modified by the classified adverbs more than 4 times. In step 2, we classify adverbs on the basis of the discussion in the previous section. Although the classification has been done by hand, it is much easier than that of verbs, since adverbs are fewer than verbs in number (2,563 vs. 12,766 in the corpus) and have higher &amp;quot;iconicity&amp;quot; -- the isomorphism between form and meaning -- than verbs. This classification of adverbs is used not only for determining the aspectual categories of verbs but also for examining the meaning of -teiru as mentioned later.</Paragraph>
      <Paragraph position="14"> The step 4 is a process to register the co-occurring forms and adverbs for each verb. By using these data, we identify the aspectual categories of verbs in the step 5. Since the categories cannot be uniquely identified by aspectual forms only, we use adverbs which can modify the only restricted set of verbs as shown in Table 8.</Paragraph>
    </Section>
    <Section position="2" start_page="356" end_page="357" type="sub_section">
      <SectionTitle>
3.2 Evaluation and Discussion
</SectionTitle>
      <Paragraph position="0"> Out of 431 target verbs, we could uniquely identify categories for 375 verbs. As for the rest 56 verbs, 37 verbs were identified in the step 5-2 as the category which was not included in the set of categories outputted by the step 5-1. This seems to be due to the failure to detect the expression of repetition, therefore, we chose the category determined in the step 5-2. Table 9 shows the results.</Paragraph>
      <Paragraph position="1"> We confirmed that more than 80% of verbs are correctly classified. However, this is a subjective judgement. To evaluate the results of the classification more objectively, we focus on one evaluation metric; namely the automatic examination of the meaning of -teiru which can represent several distinct senses as described in the introduction.</Paragraph>
      <Paragraph position="2"> The form -teiru indicates &amp;quot;zoom in&amp;quot; operation: it is a function that takes an event as its input and returns a type of states, which refers to unbounded regions i.e., a part of the time-line with no distinct boundaries. Figure 3 shows the time-line representation for each aspectual category of verbs. Aspectual distinctions correspond to how parts of the time-line are delineated.</Paragraph>
      <Paragraph position="3">  1. staUvo verbs t ) t l (1) (2) 2. atomic verbs ......................... --PS) .......................... ; ........ ; (3) 3. resultatlve verbs .............................</Paragraph>
      <Paragraph position="4"> t ~' (4) t I, (5) 4. process+result verbs ............. (c) (c) t J ~__I 4 J (s) (7) (e) 5. non-gradual procese verbs ..... ---(3 t J ................... ; .... -i (9) (10) gradual process verbs</Paragraph>
      <Paragraph position="6"> categories of verbs In Figure3, thick line segments signify regions, dashed line segments signify unbounded ends of regions and large open dots signify points in time boundaries or punctate events).</Paragraph>
      <Paragraph position="8"> akkasuru(get worse) tuyornaru(get strong) takarnaru(become raised) sinkoukasuru(get more acute) seityousuru(grow up) kappatukasuru(make active) ...</Paragraph>
      <Paragraph position="9"> kuwawaru(join) tutomeru(be employed) tomonau(accompany) tazuneru(visit) rainitisuru(eome to Japan) uwamawaru(be more than) hokoru(boast) ...</Paragraph>
      <Paragraph position="10">  Since -teiru cannot include a time instant at which a state is drastically changed, it must denote one of the intervals depicted below the lines. The interval (1) in Figure3 designates a state which is a part of the state described by a lex_ical stative verb. It means a state holding before a speaker's eyes.</Paragraph>
      <Paragraph position="11"> It has been stated from (Kindaichi, 1976) that the form -teiru has three distinct senses: &amp;quot;a simple state', 'a progressive state' and 'a consequent state'. (1) corresponds to a simple state. (4) and (7) to a consequent state, (6), (9) and (11) to a progressive state. respectively. Though not represented in Figure 3, a consequent state can be taken up with the verbs of categories 5 and 6 if the endpoints of the processes are set up by explicit expressions.</Paragraph>
      <Paragraph position="12"> Kudo (Kudo, 1982) has pointed out that there are inherent meaning and derivative meaning for both progressive and consequent states and has sorted out them as follows.</Paragraph>
      <Paragraph position="13">  (i) inherent meaning of 'a progressive state': an ongoing process (ii) derivative meaning of 'a progressive state': an iteration (iii) inherent meaning of 'a consequent state': a resultative state (iv) derivative meaning of 'a consequent state': an experiential state (v) otherwise: a simple state (ii) is the above-mentioned process of a collective event; &amp;quot;a line as a set of points&amp;quot;, so to speak. (iv)  is a state where a speaker has an experience of the event described by a verb and corresponds to the intervals (2), (3), (5), (8), (10), (12) in Figure3. These derivative meanings are conditioned syntactically or contextually, that is, they are stipulated as derivative by explicit linguistic expressions such as adverbials etc., while not concerned with the inherent features of verbs -- they can appear with most of verbs regardless of their aspectual categories. We carried out an experiment to examine the meaning of -teiru automatically by means of the classifications of verbs and adverbs obtained in the previous experiment. Table 10 shows the determination process of the meaning of -teiru. We checked the cases in Table 10 downward from the top.</Paragraph>
      <Paragraph position="14"> Table 11 shows the results obtained from running the process of Table 10 on 200 sentences containing -teiru which are randomly selected from the EDR Japanese Corpus.</Paragraph>
      <Paragraph position="15"> The precision on the whole is 71%. Note that the sense (i) 'an ongoing process' has high recall but low precision, while (iii) 'a resultative state' and (iv) 'an experiential state' show the opposite. This is due to the fact that the test sentences contain many &amp;quot;speech-act&amp;quot; verbs such as syuchousuru(insist), setumeisuru(explain), hyoumeisuru( declare) etc. They are classified as 5.non-gradual process verbs, and by  modifiers(P} or gradual change indicators (G} (6).the endpoint of the process is explicitly set up (the verb is modified by end state modifiers(E) ot quantity regulators(Q) or it takes a goal arsument i.e., ni(~o)-case etc.</Paragraph>
      <Paragraph position="16"> (7).the process cannot be taken up (the verb is modified by atomic  adverbs(A) or sudeni(already), etc.) (iv) an experiential state (v) a simple state (iii) a resultative state (i) an ongoing process (iii} a resultative state (iii) a resultative state (8).the category of the verb is (i} an ongoing process 5. non-gradual process or 6. gradual process verbs (9).the category of the verb is ambiguous: (i) or (iii) 4. process+result verbs  the case 8 in Table 10, the senses of -teiru following them are determined as (i) 'an ongoing process'. However, they takes a quotative to-case that marks the content of the statement and this measures out the event described by verbs. Therefore the resultative or experiential readings are preferred. The other errors are caused by polysemous verbs such as kakaru (hangflie//all...) or ataru (hit/strike~be exposed/shine...). Their aspectual properties are changed by the complements they take. The analysis of how complements influence the aspectual properties of their governing verbs is beyond the scope of this paper. It seems to be a matter of pragmatic world knowledge rather than sensesemantics (but see (Verkuyl, 1993) for English).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML