File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/p92-1028_evalu.xml

Size: 2,955 bytes

Last Modified: 2025-10-06 14:00:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1028">
  <Title>CORPUS-BASED ACQUISITION OF RELATIVE PRONOUN DISAMBIGUATION HEURISTICS</Title>
  <Section position="10" start_page="220" end_page="220" type="evalu">
    <SectionTitle>
4 RESULTS
</SectionTitle>
    <Paragraph position="0"> As described above, we used 100 texts (approximately 7% of the corpus) containing 176 instances of the relative pronoun &amp;quot;who&amp;quot; for training. Six of those instances were discarded when the UMass/MUC-3 syntactic analyzer failed to include the desired antecedent as part of its constituent representation, making it impossible for the human supervisor to specify the location of the antecedent. 7 After training, we tested the resulting disambiguation hierarchy on 71 novel instances extracted from an additional 50 texts in the corpus. Using the selection heuristics described above, the correct antecedent was found for 92% of the test instances. Of the 6 errors, 3 involved probes with antecedent combinations never seen in any of the training cases. This usually indicates that the semantic and syntactic structure of the novel clause differs significantly from those in the disambiguation hierarchy. This was, in fact, the case for 2 out of 3 of the errors. The third error involved a complex conjunction and appositive combination. In this case, the retrieved antecedent specified 3 out of 4 of the required constituents.</Paragraph>
    <Paragraph position="1"> If we discount the errors involving unknown antecedents, our algorithm correctly classifies 94% of the novel instances (3 errors). In comparison, the original UMass/MUC-3 system that relied on hand-coded heuristics for relative pronoun disambiguation finds the correct antecedent 87% of the time (9 errors). However, a simple heuristic that chooses the most recent phrase as the antecedent succeeds 86% of the time. (For the training sets, this heuristic works only 75% of the time.) In cases where the antecedent was not the most recent phrase, UMass/MUC-3 errs 67% of the time. Our automated algorithm errs 47% of the time.</Paragraph>
    <Paragraph position="2"> It is interesting that of the 3 errors that did not specify previously unseen an~exlents, one was caused by parsing blunders. The remaining 2 errors involved relative pronoun antecedents that are difficult even for people to specify: 1) &amp;quot;... 9 rebels died at the hands of members of the civilian militia, who resisted the attacks&amp;quot; and 2) &amp;quot;... the government expelled a group of foreign drug traffickers who had established themselves in northern Chile&amp;quot;. Our algorithm chose &amp;quot;the civilian militia&amp;quot; and &amp;quot;foreign drug traffickers&amp;quot; as the antecedents of &amp;quot;who&amp;quot; instead of the preferred antecedents &amp;quot;members of the civilian militia&amp;quot; and &amp;quot;group of foreign drug traffickers. &amp;quot;8</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML