XML Viewer - w06-2208

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2208_evalu.xml
Size: 6,500 bytes
Last Modified: 2025-10-06 13:59:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2208">
  <Title>Expanding the Recall of Relation Extraction by Bootstrapping</Title>
  <Section position="7" start_page="58" end_page="61" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> The focus of this paper is the comparison between bootstrapping strategies for extraction, i.e., string pattern learning and less restrictive pattern learning having Relation NER. Therefore, we first compare these two bootstrapping methods with the baseline system. Furthermore, we also compare Relation NER with a generic NER, which is trained on a pre-existing hand annotated corpus.</Paragraph>
    <Section position="1" start_page="58" end_page="60" type="sub_section">
      <SectionTitle>
4.1 Relation Extraction Task
</SectionTitle>
      <Paragraph position="0"> We compare SPL and LRPL with the baseline system on 5 relations: Acquisition, Merger, CeoOf, MayorOf, and InventorOf. We downloaded about from 100,000 to 220,000 sentences for each of these relations from the Web, which contained a relation label (e.g. &amp;quot;acquisition&amp;quot;, &amp;quot;acquired&amp;quot;, &amp;quot;acquiring&amp;quot; or &amp;quot;merger&amp;quot;, &amp;quot;merged&amp;quot;, &amp;quot;merging&amp;quot;). We used all the tuples that co-occur with baseline patterns at least twice as seeds. The numbers of seeds are between 33 (Acquisition) and 289 (CeoOf).</Paragraph>
      <Paragraph position="1"> For consistency, SPL employs the same assessment methods with LRPL. It uses the EM algorithm in Section 3.2.2 and merges the tuples extracted by the baseline system. In the EM algorithm, the match score C5B4D4BND8B5 between a learned pattern D4 and a tuple D8 is set to a constant AS D1CPD8CRCW .</Paragraph>
      <Paragraph position="2"> LRPL uses MinorThird (Cohen, 2004) implementation of CRF for Relation NER. The features used in the experiments are the lower-case word, capitalize pattern, part of speech tag of the current and +-2 tokens, and the previous state (tag) referring to (Minkov et al., 2005; Rosenfeld et al., 2005). The parameters used for SPL and LRPL are experimentally set as follows: AS  Figure 2-6 show the recall-precision curves. We use the number of correct extractions to serve as a surrogate for recall, since computing actual recall would require extensive manual inspection of the large data sets. Compared to the the baseline system, both bootstrapping methods increases the number of correct extractions for almost all the relations at around 80% precision. For MayorOf relation, LRPL achieves 250% increase in recall at 0.87 precision, while SPL's precision is less than the baseline system. This is because SPL can not distinguish correct tuples from the error tuples that  co-occur with a short strict pattern, and that have a wrong entity type value. An example of the error tuples extracted by SPL is the following:  Theimprovement ofAcquisition and Merger relations is small for both methods; the rules learned for Merger and Acquisition made erroneous extractions of mergers of geo-political entities, acquisition of data, ball players, languages or diseases. For InventorOf relation, LRPL does not work well. This is because 'Invention' is not a proper noun phrase, but a noun phrase. A noun phrase includes not only nouns, but a particle, a determiner, and adjectives in addition to non-capitalized nouns. Our Relation NER was unable to detect regularities in the capitalization pattern and word length of invention phrases.</Paragraph>
      <Paragraph position="3"> At around 60% precision, SPL achieves higher recall for CeoOf and MayorOf relations, in con- null syntactic variety for describing them. Therefore, learned string patterns are enough generic to extract many candidate tuples.</Paragraph>
    </Section>
    <Section position="2" start_page="60" end_page="61" type="sub_section">
      <SectionTitle>
4.2 Entity Recognition Task
</SectionTitle>
      <Paragraph position="0"> Generic types such as person, organization, and location cover many useful relations. One might expect that NER trained for these generic types, can be used for different relations without modifications, instead of creating a Relation NER.</Paragraph>
      <Paragraph position="1"> To show the effectiveness of Relation NER, we compare Relation NER with a generic NER trained on a pre-existent hand annotated corpus for generic types; we used MUC7 train, dry-run test, and formal-test documents(Table 2) (Chinchor, 1997). We also incorporate the following additional knowledge into the CRF's features referring to (Minkov et al., 2005; Rosenfeld et al.,  Table2: Thenumber of entities and unique entities in MUC7 corpus. The number of documents is 225.</Paragraph>
      <Paragraph position="2">  2005): first and last names, city names, corp designators, company words (such as &amp;quot;technology&amp;quot;), and small size lists of person title (such as &amp;quot;mr.&amp;quot;) and capitalized common words (such as &amp;quot;Monday&amp;quot;). The base features for both methods are the same as the ones described in Section 4.1.</Paragraph>
      <Paragraph position="3"> The ideal entity recognizer for relation extraction is recognizing only entities that have an argument type for a particular relation. Therefore, a generic test set such as MUC7 Named Entity Recognition Task can not be used for our evaluation. We randomly selected 200 test sentences from our dataset that had a pair of correct entities for CeoOf or MayorOf relations, and were not used as training for the Relation NER. We measured the accuracy as follows.</Paragraph>
      <Paragraph position="4">  is a set of true entities that have an argument type of a target relation. BX CTDCD8D6CPCRD8CTCS is a set of entities extracted as an argument.</Paragraph>
      <Paragraph position="5"> Because Relation NER is trained for argument types (such as 'Mayor'), and the generic NER is trained for generic types (such as person), this calculation is in favor of Relation NER. For fair comparison, we also use the following measure.</Paragraph>
      <Paragraph position="6">  is a set of true entities that have a generic type 2.</Paragraph>
      <Paragraph position="7"> Table 3 shows that the Relation NER consistently works better than the generic NER, even when additional knowledge much improved the recall. This suggests that training a Relation NER for each particular relation in bootstrapping is better approach than using a NER trained for generic types.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML