File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-1053_evalu.xml

Size: 8,652 bytes

Last Modified: 2025-10-06 13:59:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1053">
  <Title>Exploring Various Knowledge in Relation Extraction</Title>
  <Section position="6" start_page="430" end_page="432" type="evalu">
    <SectionTitle>
5 Experimentation
</SectionTitle>
    <Paragraph position="0"> This paper uses the ACE corpus provided by LDC to train and evaluate our feature-based relation extraction system. The ACE corpus is gathered from various newspapers, newswire and broadcasts. In this paper, we only model explicit relations because of poor inter-annotator agreement in the annotation of implicit relations and their limited number.</Paragraph>
    <Section position="1" start_page="430" end_page="430" type="sub_section">
      <SectionTitle>
5.1 Experimental Setting
</SectionTitle>
      <Paragraph position="0"> We use the official ACE corpus from LDC. The training set consists of 674 annotated text documents (~300k words) and 9683 instances of relations. During development, 155 of 674 documents in the training set are set aside for fine-tuning the system. The testing set is held out only for final evaluation. It consists of 97 documents (~50k words) and 1386 instances of relations. Table 1 lists the types and subtypes of relations for the ACE Relation Detection and Characterization (RDC) task, along with their frequency of occurrence in the ACE training set. It shows that the  Those words that have the semantic classes &amp;quot;Parent&amp;quot;, &amp;quot;GrandParent&amp;quot;, &amp;quot;Spouse&amp;quot; and &amp;quot;Sibling&amp;quot; are automatically set with the same classes without change. However, The remaining words that do not have above four classes are manually classified.</Paragraph>
      <Paragraph position="1"> ACE corpus suffers from a small amount of annotated data for a few subtypes such as the subtype &amp;quot;Founder&amp;quot; under the type &amp;quot;ROLE&amp;quot;. It also shows that the ACE RDC task defines some difficult sub-types such as the subtypes &amp;quot;Based-In&amp;quot;, &amp;quot;Located&amp;quot; and &amp;quot;Residence&amp;quot; under the type &amp;quot;AT&amp;quot;, which are difficult even for human experts to differentiate.</Paragraph>
      <Paragraph position="2">  In this paper, we explicitly model the argument order of the two mentions involved. For example, when comparing mentions m1 and m2, we distinguish between m1-ROLE.Citizen-Of-m2 and m2-ROLE.Citizen-Of-m1. Note that only 6 of these 24 relation subtypes are symmetric: &amp;quot;Relative-Location&amp;quot;, &amp;quot;Associate&amp;quot;, &amp;quot;Other-Relative&amp;quot;, &amp;quot;Other-Professional&amp;quot;, &amp;quot;Sibling&amp;quot;, and &amp;quot;Spouse&amp;quot;. In this way, we model relation extraction as a multi-class classification problem with 43 classes, two for each relation subtype (except the above 6 symmetric subtypes) and a &amp;quot;NONE&amp;quot; class for the case where the two mentions are not related.</Paragraph>
    </Section>
    <Section position="2" start_page="430" end_page="431" type="sub_section">
      <SectionTitle>
5.2 Experimental Results
</SectionTitle>
      <Paragraph position="0"> In this paper, we only measure the performance of relation extraction on &amp;quot;true&amp;quot; mentions with &amp;quot;true&amp;quot; chaining of coreference (i.e. as annotated by the corpus annotators) in the ACE corpus. Table 2 measures the performance of our relation extrac- null tion system over the 43 ACE relation subtypes on the testing set. It shows that our system achieves best performance of 63.1%/49.5%/ 55.5 in precision/recall/F-measure when combining diverse lexical, syntactic and semantic features. Table 2 also measures the contributions of different features by gradually increasing the feature set. It shows that:  relation subtypes in the test data * Using word features only achieves the performance of 69.2%/23.7%/35.3 in precision/recall/Fmeasure. null * Entity type features are very useful and improve the F-measure by 8.1 largely due to the recall increase.</Paragraph>
      <Paragraph position="1"> * The usefulness of mention level features is quite limited. It only improves the F-measure by 0.8 due to the recall increase.</Paragraph>
      <Paragraph position="2"> * Incorporating the overlap features gives some balance between precision and recall. It increases the F-measure by 3.6 with a big precision decrease and a big recall increase.</Paragraph>
      <Paragraph position="3"> * Chunking features are very useful. It increases the precision/recall/F-measure by 4.1%/5.6%/</Paragraph>
    </Section>
    <Section position="3" start_page="431" end_page="432" type="sub_section">
      <SectionTitle>
5.2 respectively.
</SectionTitle>
      <Paragraph position="0"> * To our surprise, incorporating the dependency tree and parse tree features only improve the F-measure by 0.6 and 0.4 respectively. This may be due to the fact that most of relations in the ACE corpus are quite local. Table 3 shows that about 70% of relations exist where two mentions are embedded in each other or separated by at most one word. While short-distance relations dominate and can be resolved by above simple features, the dependency tree and parse tree features can only take effect in the remaining much less long-distance relations. However, full parsing is always prone to long distance errors although the Collins' parser used in our system represents the state-of-the-art in full parsing.</Paragraph>
      <Paragraph position="1"> * Incorporating semantic resources such as the country name list and the personal relative trigger word list further increases the F-measure by 1.5 largely due to the differentiation of the relation subtype &amp;quot;ROLE.Citizen-Of&amp;quot; from &amp;quot;ROLE. Residence&amp;quot; by distinguishing country GPEs from other GPEs. The effect of personal relative trigger words is very limited due to the limited number of testing instances over personal social relation subtypes.</Paragraph>
      <Paragraph position="2"> Table 4 separately measures the performance of different relation types and major subtypes. It also indicates the number of testing instances, the number of correctly classified instances and the number of wrongly classified instances for each type or subtype. It is not surprising that the performance on the relation type &amp;quot;NEAR&amp;quot; is low because it occurs rarely in both the training and testing data.</Paragraph>
      <Paragraph position="3"> Others like &amp;quot;PART.Subsidary&amp;quot; and &amp;quot;SOCIAL. Other-Professional&amp;quot; also suffer from their low occurrences. It also shows that our system performs best on the subtype &amp;quot;SOCIAL.Parent&amp;quot; and &amp;quot;ROLE. Citizen-Of&amp;quot;. This is largely due to incorporation of two semantic resources, i.e. the country name list and the personal relative trigger word list. Table 4 also indicates the low performance on the relation type &amp;quot;AT&amp;quot; although it frequently occurs in both the training and testing data. This suggests the difficulty of detecting and classifying the relation type &amp;quot;AT&amp;quot; and its subtypes.</Paragraph>
      <Paragraph position="4"> Table 5 separates the performance of relation detection from overall performance on the testing set. It shows that our system achieves the performance of 84.8%/66.7%/74.7 in precision/recall/Fmeasure on relation detection. It also shows that our system achieves overall performance of 77.2%/60.7%/68.0 and 63.1%/49.5%/55.5 in precision/recall/F-measure on the 5 ACE relation types and the best-reported systems on the ACE corpus.</Paragraph>
      <Paragraph position="5"> It shows that our system achieves better performance by ~3 F-measure largely due to its gain in recall. It also shows that feature-based methods dramatically outperform kernel methods. This suggests that feature-based methods can effectively combine different features from a variety of sources (e.g. WordNet and gazetteers) that can be brought to bear on relation extraction. The tree kernels developed in Culotta et al (2004) are yet to be effective on the ACE RDC task.</Paragraph>
      <Paragraph position="6"> Finally, Table 6 shows the distributions of errors. It shows that 73% (627/864) of errors results  from relation detection and 27% (237/864) of errors results from relation characterization, among which 17.8% (154/864) of errors are from misclassification across relation types and 9.6% (83/864) of errors are from misclassification of relation sub-types inside the same relation types. This suggests that relation detection is critical for relation extraction. null # of other mentions in between # of relations</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML