File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-2086_evalu.xml

Size: 3,537 bytes

Last Modified: 2025-10-06 13:59:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2086">
  <Title>URES : an Unsupervised Web Relation Extraction System</Title>
  <Section position="6" start_page="671" end_page="672" type="evalu">
    <SectionTitle>
4 Experimental Evaluation
</SectionTitle>
    <Paragraph position="0"> In order to evaluate URES, we used five predi- null the order of its attributes does not matter. Acquisition is antisymmetric, and the other three are tested as bound in the first attribute. For the bound predicates, we are only interested in the instances with particular prespecified values of the first attribute.</Paragraph>
    <Paragraph position="1"> We test both modes of operation - using shallow parser and using TEG. In the shallow parser mode, the Invention attribute of the InventorOf predicate is of type CommonNP, and all other attributes are of type ProperName. In the TEG mode, the &amp;quot;Company&amp;quot; attributes are of type Organization, the &amp;quot;Name&amp;quot; attributes are of type Person, the &amp;quot;City&amp;quot; attribute is of type Location, and the &amp;quot;Invention&amp;quot; attribute is of type NounPhrase. null We evaluate our system by running it over a large set of sentences, counting the number of extracted instances, and manually checking a random sample of the instances to estimate precision. In order to be able to compare our results with KnowItAll-produced results, we used the set of sentences collected by the KnowItAll's crawler as if they were produced by the Sentence Gatherer.</Paragraph>
    <Paragraph position="2"> The set of sentences for the Acquisition and Merger predicates contained around 900,000 sentences each. For the other three predicates, each of the sentences contained one of the 100 predefined values for the first attribute. The values (100 companies for CEO_Of, 100 cities for MayorOf, and 100 inventors for InventorOf) are entities collected by KnowItAll, half of them are frequent entities (&gt;100,000 hits), and another half are rare (&lt;10,000 hits).</Paragraph>
    <Paragraph position="3"> In all of the experiments, we use ten top predicate instances extracted by KnowItAll for the relation seeds needed by the Pattern Learner. The results of our experiments are summarized in the Table 2. The table displays the number of extracted instances and estimated precision for three different URES setups, and for the KnowItAll manually built patterns. Three results are shown for each setup and each relation - extractions supported by at least one, at least two, and at least three different sentences, respectively.</Paragraph>
    <Paragraph position="4"> Several conclusions can be drawn from the results. First, both modes of URES significantly outperform KnowItAll in recall (number of extractions), while maintaining the same level of precision or improving it. This demonstrates utility of our pattern learning component. Second, it is immediately apparent, that using only anchored patterns significantly improves precision of NP Tagger-based URES, though at a high cost in recall. The NP tagger-based URES with anchored patterns performs somewhat worse than  TEG-based URES on all predicates except InventorOf, as expected. For the InventorOf, TEG performs worse, because of overly simplistic implementation of the NounPhrase concept inside the TEG grammar - it is defined as a sequence of zero or more adjectives followed by a sequence of nouns. Such definition often leads to only part of a correct invention name being extracted. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML