XML Viewer - p98-1085

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1085_evalu.xml
Size: 7,081 bytes
Last Modified: 2025-10-06 14:00:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1085">
  <Title>Definiteness Predictions for Japanese Noun Phrases*</Title>
  <Section position="6" start_page="522" end_page="524" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="522" end_page="523" type="sub_section">
      <SectionTitle>
4.1 Performance of the algorithm
</SectionTitle>
      <Paragraph position="0"> The performance of our framework is best described in terms of recall and precision, where recall refers to the proportion of all relevant noun phrases that have been assigned a correct definiteness attribute, whilst precision expresses the percentage of correct assignments among all attributes assigned.</Paragraph>
      <Paragraph position="1"> The hierarchy was designed as a pre-process to context checking, extracting all values that can be assigned on linguistic grounds alone, but leaving all others underspecified. It is therefore  to be expected that its coverage, i.e. the percentage of noun phrases assigned a value by the hierarchy, is relatively low. However, since we propose that the decision algorithm should be monotone, it is vitally important for the precision to be as near to 100% as possible. Any wrong assignments at any stage of the process will inevitably lead to incorrect translation results. null To evaluate the hierarchy, we tested the performance of our rule base on 20 unseen dialogues from the corpus. All noun phrases in the dialogues were first annotated with their definiteness attributes, followed by the list of rules with matching preconditions. As a second step, the rules applicable to each noun phrase were ordered according to their class, and the prediction of the one highest in the hierarchy was compared with the annotated value.</Paragraph>
      <Paragraph position="2"> In the test data, there are 346 noun phrases that need assignment of definiteness attributes. 4 Table 1 shows the number of noun phrase occurrences covered by each rule class, i.e. the number of times one of the noun phrases was assigned a definiteness attribute by any of the rules from each class. This value was then further divided into the number of correct and incorrect assignments made. From this, the precision was calculated, dividing the number of values correctly assigned by the number of values assigned at all. Overall, with a precision of 98,9%, the aim of high accuracy has been achieved.</Paragraph>
      <Paragraph position="3"> Dividing the number of correct assignments by the number of noun phrases that need assign4Additionally, there are 388 time expressions (i.e. dates, times, weekdays and times of day) that under certain conditions also need an article during generation. However, these were excluded from the statistics, since nearly all of them were found to be trivially definite, somehow artificially pushing the recall of the rules in the hierarchy up to 88,8%.</Paragraph>
      <Paragraph position="4"> ment, we get a recall of 78,6%. Thus, within the appointment scheduling domain, the hierarchy already accounts for 79,5% of all relevant noun phrases, leaving just 20,5% for the computationally expensive context checking.</Paragraph>
      <Paragraph position="5"> Of the 71 noun phrases left underspecified, 40 have definite reference, suggesting 'definite' as the default value if the hierarchy was to be used as the sole means of assigning definiteness attributes. This means, that a system integrating this algorithm with an efficient context checking mechanism should have a recall of at least 90%, since this is what can already be achieved by using a default value.</Paragraph>
    </Section>
    <Section position="2" start_page="523" end_page="524" type="sub_section">
      <SectionTitle>
4.2 Comparison to previous approaches
</SectionTitle>
      <Paragraph position="0"> The performance of our framework has been found to be better than both of the heuristic rule based approaches introduced in section 2, even before context checking. However, our framework was defined and tested on the restrictive domain of appointment scheduling.</Paragraph>
      <Paragraph position="1"> Most of the really difficult cases for article selection, as for example generics, do not occur in this domain, whilst both (Murata and Nagao, 1993) and (Bond et al., 1995) build their theories around the problem of identifying these.</Paragraph>
      <Paragraph position="2"> There are no statistics on the performance of their systems on a corpus that does not contain any generics.</Paragraph>
      <Paragraph position="3"> The transfer-based approach of (Siegel, 1996) also covers data from the appointment scheduling domain, using both linguistic and contextual information for assigning defininteness. However, her results can still not be compared with our approach, since we do not have any figures on how high the recall of our algorithm is with context checking in place. In addition, the performance data given for our hierarchy was derived from unseen data rather than the data that were used to draw up the rules, as in Siegel's case.</Paragraph>
      <Paragraph position="4">  Even though no direct comparison is possible because of the different test methods and data sets used, we have been able to show that an approach using a monotone rule hierarchy that can be easily integrated with a context checking mechansim leads to very good results.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="524" end_page="524" type="evalu">
    <SectionTitle>
5 Implementation
</SectionTitle>
    <Paragraph position="0"> The current framework has been designed as part of the dialogue and discourse processing component of the Verbmobil machine translation system, a large scale research project in the area of spontaneous speech dialogue translation between German, English and Japanese (Wahlster, 1997). Within the modular system architecture, the dialogue and discourse processing is situated in between the components for semantic construction (Gamb~ck et al., 1996) and semantic-based transfer (Dorna and Emele, 1996). It uses context knowledge to resolve semantic representations possibly under-specified with respect to syntactic or semantic ambiguities.</Paragraph>
    <Paragraph position="1"> At this stage, all the information needed for definiteness assignment is easily accessible, enabling the rules in our hierarchy to be implemented one-to-one as simple implications. Since all information is accessible at all times, the application of the rules can be ordered according to the hierarchy. Only if none of the rules given in the hierarchy are applicable, will the context checking process be started. If an antecedent can be found for the relevant noun phrase, it will be assigned definite reference, otherwise it is taken to be indefinite.</Paragraph>
    <Paragraph position="2"> The algorithm will terminate as soon as a value has been assigned, thus ensuring monotonicity and efficiency, as 45% of all noun phrases are already assigned a value by one of the noun rules at the top of the hierarchy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML