XML Viewer - p98-2233

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2233_metho.xml
Size: 23,365 bytes
Last Modified: 2025-10-06 14:15:06
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2233">
  <Title>Feasibility Study for Ellipsis Resolution in Dialogues</Title>
  <Section position="3" start_page="0" end_page="1428" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In machine translation systems, it is necessary to resolve ellipses when the source language doesn't express the subject or other grammatical cases and the target must express it. The problem of ellipsis resolution is also troublesome in information extraction and other natural language processing fields.</Paragraph>
    <Paragraph position="1"> Several approaches have been proposed to resolve ellipses, which consist of endophoric (intrasentential or anaphoric) ellipses and exophoric (or extrasentential) ellipses. One of the major approaches for endophoric ellipsis in theoretical basis utilizes the centering theory. However, its application to complex sentences has not been established because most studies have only investigated its effectiveness with successive simple sentences.</Paragraph>
    <Paragraph position="2"> Several studies of this problem have been made using the empirical approach. Among them, Murata and Nagao (1997) proposed a scoring approach where each constraint is manually scored with an estimation of possibility, and the resolution is conducted by totaling the points each candidate receives. On the other hand, Nakaiwa and Shirai (1996) proposed a resolving algorithm for Japanese exophoric ellipses of written texts, utilizing semantic and pragmatic constraints. They claimed that 100% of the ellipses with exophoric referents could be resolved, but the experiment was a closed test with only a few samples. These approaches always require some effort to decide the scoring or the preference of provided constraints.</Paragraph>
    <Paragraph position="3"> Aone and Bennett (1995) applied a machine-learning technique to anaphora resolution in written texts. They attempted endophoric ellipsis resolution as a part of anaphora resolution, with approximately 40% recall and 74~ precision at best from 200 test samples. However, they were not concerned with exophoric ellipsis.</Paragraph>
    <Paragraph position="4"> In contrast, we applied a machine-learning approach to ellipsis resolution (Yamamoto et al., 1997). In this previous work we resolved the agent case ellipses in dialogue, with a limited topic, and performed with approximately 90% accuracy. This does not sufficiently determine the effectiveness of the decision tree, and the feasibility of this technique in resolving ellipses by each surface case is also unclear.</Paragraph>
    <Paragraph position="5"> We propose a method to resolve the ellipses that appear in Japanese dialogues. This method resolves not only the subject ellipsis, but also the object and other grammatical cases. In this approach, a machine-learning algorithm is used to build a decision tree by selecting the necessary attributes, and the decision tree is used as the actual ellipsis resoh'er.</Paragraph>
    <Paragraph position="6"> Another purpose of this paper is to discuss how effective the machine-learning approach is  in the problem of ellipsis resolution. In the following sections, we discuss topic-dependency in decision trees and compare the resolution effectiveness of each grammatical case. The problem of data size relative to the decision-tree training is also discussed.</Paragraph>
    <Paragraph position="7"> In this paper, we assume that the detection of ellipses is performed by another module, such as a parser. We only considered ellipses that are commonly and dearly identified.</Paragraph>
    <Paragraph position="8"> 2 When to Resolve Ellipsis in MT ? As described above, our major application for ellipsis resolution is in machine translation. In an MT process, there can be several approaches about the timing of ellipsis resolution: when analyzing the source language, when generating the target language, or at the same time as translating process. Among these candidates, most of the previous works with Japanese chose the source-language approach. For instance, Nakaiwa and Shirai (1996) attempted to resolve Japanese ellipsis in the source language analysis of J-to-E MT, despite utilizing targetdependent resolution candidates.</Paragraph>
    <Paragraph position="9"> We originally thought that ellipsis resolution in the MT was a generation problem, namely a target-driven problem which utilizes some help, if necessary, of source-language information. This is because the problem is outputdependent and it relies on demands from a target language. In the J-to-Korean or J-to-Chinese MT, all or most of the ellipses that must be resolved in J-to-E are not necessary to resolve.</Paragraph>
    <Paragraph position="10"> However, we adopted source-language policy in this paper, with the necessity that we consider a multi-lingual MT system TDMT (Furuse et al.; 1995), that deals with both J-to-E and Jto-German MT. English and German grammar are not generally believed to be similar.</Paragraph>
  </Section>
  <Section position="4" start_page="1428" end_page="1429" type="metho">
    <SectionTitle>
3 Ellipsis Resolution by Machine
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1428" end_page="1428" type="sub_section">
      <SectionTitle>
Learning
</SectionTitle>
      <Paragraph position="0"> Since a huge text corpus has become widely available, the machine-learning approach has been utilized for some problems in natural language processing. The most popular touchstone in this field is the verbal case frame or the translation rules (Tanaka, 1994). Machine-learning algorithm has also been attempted to solve some  (g) (a) first person, singular first person, plural second person, singular second person, plural person(s) ~n general anaphoric  discourse processing problems, for example, in discourse segment boundaries or discourse cue words (Walker and Moore, 1997). This section describes a method to apply a decision-tree learning approach, which is one of the machine-learning approaches, to ellipsis resolution.</Paragraph>
    </Section>
    <Section position="2" start_page="1428" end_page="1428" type="sub_section">
      <SectionTitle>
3.1 Ellipsis Tagging
</SectionTitle>
      <Paragraph position="0"> In order to train and evaluate our ellipsis resolver, we tagged some ellipsis types to a dialogue corpus. The ellipsis types used to tag the corpus are shown in Table 1. Each ellipsis marker is tagged at the predicate. We made a distinction between first or second person and person(s) in general. Note that 'person(s) in general' refers to either an unidentified or an unspecified person or persons. In Far-Eastern languages such as Japanese, Korean, and Chinese, there is no grammatically obligatory case such as the subject in English. It is thus necessary to distinguish such ellipses.</Paragraph>
      <Paragraph position="1"> We also made a tag '(a/' which means the mentioned ellipsis is anaphoric; in case we need to refer back to the antecedent in the dialogue.</Paragraph>
      <Paragraph position="2"> In this paper we are not concerned with resolving the antecedent that such ellipses refer to, because it is necessary to have another module to deal with the context for resolving such endophoric ellipses, and the main target of this paper is the exophoric ellipses.</Paragraph>
    </Section>
    <Section position="3" start_page="1428" end_page="1429" type="sub_section">
      <SectionTitle>
3.2 Learning Method
</SectionTitle>
      <Paragraph position="0"> We used the C~.5 algorithm by Quinlan (1993), which is a well-known automatic classifier that produces a binary decision tree. Although it may be necessary to prune decision trees, no pruning is performed throughout this experiment, since we want to concentrate the discussion on the feasibility of machine learning.</Paragraph>
      <Paragraph position="1"> As shown in the experiment by Aone and Ben-</Paragraph>
    </Section>
    <Section position="4" start_page="1429" end_page="1429" type="sub_section">
      <SectionTitle>
Attributes Num.
</SectionTitle>
      <Paragraph position="0"> Content words (predicate) 100 Content words (case frame) 100 Func. words (case particle) 9 Func. words (conj. particle) 21 Func. words (auxiliary verb) 132 Func. words (other) 4 Exophoric information 1 Total 367 nett (1995), which attempted to discuss pruning effects on the decision tree, no more conclusions are expected other than a trade-off between recall and precision. We leave the details of decision-tree learning research to itself.</Paragraph>
    </Section>
    <Section position="5" start_page="1429" end_page="1429" type="sub_section">
      <SectionTitle>
3.3 Training Attributes
</SectionTitle>
      <Paragraph position="0"> The training attributes that we prepared for Japanese ellipsis resolution are listed in Table 2. The training attributes in the table are classified into the following three groups:  Functional words which express tense, modality, etc.</Paragraph>
      <Paragraph position="1"> There is one approach that only uses topic-independent information to resolve ellipses that appear in dialogues. However, we took the position that both topic-dependent and independent information should have different knowledge. Thus, approaches utilizing only topic-independent knowledge must have a performance limit for developing an ellipsis resolution system. It is practical to seek an automatically trainable system that utilizes both types of knowledge.</Paragraph>
      <Paragraph position="2"> The effective use of exophoric information, i.e., from the actual world, may perform well for resolving an ellipsis. Exophoric information consists of a lot of elements, such as the time, the place, the speaker, and the listener of the utterance. However, it is difficult to become aware of some of them, and some are rather difficult to prescribe. Thus we utilize one element, the speaker's social role, i.e., whether the speaker is the customer or the clerk. The reason for this is that it must be an influential attribute, and it is easy to detect in the actual world. Many of us would accept a real system such as a spoken-language translation system that detects speech with independent microphones.</Paragraph>
      <Paragraph position="3"> It is generally agreed that attributes to resolve ellipses should be different in each case. Thus although we have to prepare them on a case by case basis, we trained a resolver with the same attributes.</Paragraph>
      <Paragraph position="4"> Because we must deal with the noisy input that appears in real applications, the training attributes, other than the speaker's social role, are questioned on a morphological basis. We give each attribute its positional information, i.e., search space of morphemes from the target predicate. Positional information can be one of five kinds: before, at the latest, here, next, and afterward. For example, a case particle is given the position of 'before', the search position of a prefix 'o-' or 'go-' is the 'latest', and an auxiliary verb is 'after' the predicate. The attributes of predicates, and their semantic categories are placed in 'here'.</Paragraph>
      <Paragraph position="5"> For predicate semantics, we utilized the top two layers of Kadokawa Ruigo Shin-Jiten, a three-layered hierarchical Japanese thesaurus.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1429" end_page="1431" type="metho">
    <SectionTitle>
4 Discussion
</SectionTitle>
    <Paragraph position="0"> In this section we discuss the feasibility of the ellipsis resolver via a decision tree in detail from three points of view: the amount of training data, the topic dependency, and the case difference. The first two are discussed against 'ga(v.)' case (see subsection 4.3).</Paragraph>
    <Paragraph position="1"> We used F-measures metrics to evaluate the performance of ellipsis resolution. The F-measure is calculated by using recall and precision: null</Paragraph>
    <Paragraph position="3"> where P is precision and R is recall. In this paper, F-measure is described with a percentage (%).</Paragraph>
    <Section position="1" start_page="1430" end_page="1430" type="sub_section">
      <SectionTitle>
4.1 Amount of Training Data
</SectionTitle>
      <Paragraph position="0"> We trained decision trees with a varied number of training dialogues, namely 25, 50, 100, 200 and 400 dialogues, each of which included a smaller set of training dialogues. The experiment was done with 100 test dialogues (1685 subject ellipses), and none were included in the training dialogues.</Paragraph>
      <Paragraph position="1"> Table 3 indicates the training size and performance calculated by F-measure. This illustrates that the performance improves as the training size increases in all types of ellipses. Although it is not shown in the table, we note that the results in both recall and precision improve continuously as well as those in F-measure.</Paragraph>
      <Paragraph position="2"> The performance difference of all ellipsis types by training size is also plotted in Figure 1 on a semi-logarithmic scale. It is interesting to see from the figures that the rate of improvement gradually decelerates and that some of the ellipsis types seem to have practically stopped improving at around 400 training dialogues (6806 samples). Aone and Bennett (1995) claimed that the overall anaphora resolution performance seems to have reached a plateau at around 250 training examples. This result, however, indicates that 104 ,,~ 10 s training samples would be enough to train the trees in this task.</Paragraph>
      <Paragraph position="3"> The chart gives us more information that performance limitation with our approach would be 80% ,,~ 85% because each ellipsis type seems to approach the similar value, in particular for those in large training samples (lsg) and (2sg).</Paragraph>
      <Paragraph position="4"> Greater performance improvement is expected by conducting more training in (2pl) and (g).</Paragraph>
    </Section>
    <Section position="2" start_page="1430" end_page="1431" type="sub_section">
      <SectionTitle>
4.2 Topic Dependencies
</SectionTitle>
      <Paragraph position="0"> It is completely satisfactory to build resolution knowledge only with topic-independent information. However, is it practical? We will discuss this question by conducting a few experi- null We utilized the ATI~ travel arrangement corpus (Furuse et al., 1994). The corpus contains dialogues exchanged between two people. Various topics of travel arrangements such as immigration, sightseeing, shopping, and ticket ordering are included in the corpus. A dialogue consists of 10 to 30 exchanges. We classified dialogues of the corpus into four topic categories: H1 Hotel room reservation, modification and cancellation H2 Hotel service inquiry and troubleshooting HR Other hotel arrangements, such as hotel selection and an explanation of hotel facilities R Other travel arrangements Fifty dialogues were chosen randomly from the corpus in the topic category H1, H2, R, and the overall topic T(= H1 + H2 + HR + R) as training dialogues. We used 100 unseen dialogues as test samples again, which were the same as the samples used in the training-size experiment.</Paragraph>
      <Paragraph position="1"> Table 4 shows the topic-dependency of each topic category that we provide with the Fmeasure. For instance, the first figure in the 'T/' row (73.4) denotes that the accuracy with the F-measure is 73.4% against topic H1 test samples when training is conducted on T, i.e., all topics. Note that the second row of the table indicates the ingredient of each topic in the test samples (and thus, the corpus).</Paragraph>
      <Paragraph position="3"> T- Hn/ 73.7 61.9 59.5 63.9 64.8 The results illustrate that very high accuracy is obtained when a training topic and a test topic coincide. This implies the importance not to train dialogues of unnecessary topics if the resolution topic is imaginable or restricted, in order to obtain higher performance. Among four topic subcategories, topic R shows the highest accuracy (69.9%) in total performance. The reason is not that topic R has something important to train, but that topic R contains the most test dialogues chosen at random.</Paragraph>
      <Paragraph position="4"> The table also illustrates that a resolver trained in various kinds of topics ('T/') demonstrates higher resolving accuracy against the testing data set. It performs with better than average accuracy in every topic compared to one which is trained in a biased topic. By looking at some examples it may be possible to build an all-around ellipsis resolver, but topic-dependent features are necessary for better performance.</Paragraph>
      <Paragraph position="5"> The 'T - Hn/' resolver shows the lowest performance (59.5%) against '/Hn' test set. This result is more evidence supporting the importance of topic-dependent features.</Paragraph>
    </Section>
    <Section position="3" start_page="1431" end_page="1431" type="sub_section">
      <SectionTitle>
4.3 Difference in Surface Case
</SectionTitle>
      <Paragraph position="0"> We applied a machine-learned resolver to agent case ellipses (Yamamoto et at., 1997). In this paper, we discuss whether this technique is applicable to surface cases.</Paragraph>
      <Paragraph position="1"> We examined the feasibility of a machine-learned ellipsis resolver for three principal surface cases in Japanese, 'ga', 'wo', and 'hi q.</Paragraph>
      <Paragraph position="2"> Roughly speaking, they express the subject, the direct object, and the indirect object of a sentence respectively. We classified the 'ga' case into two samples: a predicate of a sentence with a 'ga' case ellipsis that is a verb or an adjective.  In other words, this distinction corresponds to whether a sentence in English is a be-verb or a general-verb sentence. Henceforth, we call them 'ga(v.)' and 'ga(adj.)' respectively.</Paragraph>
      <Paragraph position="3"> The training attributes provided are the same in all surface cases. They are listed in Table 2. In the experiment, 300 training dialogues and 100 unseen test dialogues were used. The following results are shown in Table 52 . The table illustrates that the ga(adj.) resolver has a similar performance to the ga(v.) resolver, whereas the former has a distinctive tendency toward the latter in each ellipsis type. The ga(adj.) case resolver produces unsatisfactory results in Clsg/ and (2sg/ellipses, since insufficient samples appeared in the training set.</Paragraph>
      <Paragraph position="4"> In the 'wo' case, more than 90% of the samples are tagged with Ca), thus they are easily recognized as anaphoric. Although it may be difficult to decide the antecedents in the anaphoric ellipses by using information in Table 2, the results show that it is possible to simply recognize them. After recognizing that the ellipsis is anaphoric, it is possible to resolve them in other contextual processing modules, such as centering. null It is important to note that a satisfactory performance is presented for the 'ni' case (mostly indirect object). One reason for this could be that many indirect objects refer to exophoric persons, and thus an approach utilizing a decision tree that makes a selection from fixed decision candidates is suitable for 'ni' resolution.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="1431" end_page="10000" type="metho">
    <SectionTitle>
5 Inside a Decision Tree
</SectionTitle>
    <Paragraph position="0"> A decision tree is a convenient resolver for some kinds of problems, but we should not regard it as a black-box tool. It tells us what attributes are important, whether or not the attributes are  sufficient, and sometimes more. In this section, we investigate decision trees and discuss them in detail.</Paragraph>
    <Section position="1" start_page="10000" end_page="10000" type="sub_section">
      <SectionTitle>
5.1 Tree Shape
</SectionTitle>
      <Paragraph position="0"> The relation between the number of training samples and the number of nodes in a decision tree is shown logarithmically in Figure 2. It is clear from the chart that the two factors of 'ga(v.)' case are logarithmically linear. This is because no pruning is conducted in building a decision tree. We also see that a more compact tree is built in the order of 'wo', 'nz', 'ga(adj.)' and :ga(v.)'. This implies that the 'wo' case is the easiest of the four cases for characterizing the individuality among the ellipsis types.</Paragraph>
      <Paragraph position="1"> Table 6 shows node depth and the maximum width in the decision trees we have built. By studying Table 5 and Table 6, we can see that the shallower the decision tree is, the better the resolver performs. One explanation for this may be that a deeper (and maybe bigger) decision tree fails to characterize each ellipsis type well, and thus it performs worse.</Paragraph>
    </Section>
    <Section position="2" start_page="10000" end_page="10000" type="sub_section">
      <SectionTitle>
5.2 Attribute Coverage
</SectionTitle>
      <Paragraph position="0"> We define a factor 'coverage' for each attribute.</Paragraph>
      <Paragraph position="1"> Attribute coverage is the rate of the samples used to reach a decision about the samples used to build a decision tree. If an attribute is used at the top node of a decision tree, the attribute coverage is 100% in the definition, because all samples use it (first) to reach their decision.</Paragraph>
      <Paragraph position="2"> From this, we can learn the participation of each attribute, i.e., each attribute's importance.</Paragraph>
      <Paragraph position="3"> Some typical attribute-coverages are expressed in Table 7. Note that 'ga(25)' denotes the results of 'ga(v.)' with 25-dialogue training.</Paragraph>
      <Paragraph position="4"> A glance at the table will reveal that the coverage is not constant with an increasing number of training dialogues. Here we build a hypothesis from the table that more genera\] attributes are preferred with a increase in training size.</Paragraph>
      <Paragraph position="5"> The table illustrates that the topic-independent attributes increase with a rise in training size, such as '-tekudasaru' or ' teitadaku' (both auxiliary verbs which express the hearer's action toward the speaker with the speaker's respect). The table shows in contrast that the topic-dependent attributes decrease, such as ':before 72' (a category in which words concerned with intention are included before the predicate mentioned) or ':before 94'. There are also some topic-independent words such as '-ka' (a particle that expresses that the sentence is interrogative) or ':before ~1/~3 '3 which are still important regardless of the training size.</Paragraph>
      <Paragraph position="6"> This indicates the advantages of a machine-learning approach, because difficulties always arise in differentiating these words in manual approaches.</Paragraph>
      <Paragraph position="7"> Table 8 also contrasts typical coverage in surface cases. It illustrates that there is a distinct difference between 'ga(v.)' and 'ga(adj.)'. The resolver of the 'ga(adj.)' case is interested in another cases, such as '-de' or contents of another case ':before 16/34', whereas 'ga(v.)' case resolver checks some predicates and influential functional words. Coverage of each attribute in the 'hi' case has similar tendencies to those in the 'ga(v.)' case, except for a few attributes.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML