File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-2086_relat.xml

Size: 3,174 bytes

Last Modified: 2025-10-06 14:15:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2086">
  <Title>URES : an Unsupervised Web Relation Extraction System</Title>
  <Section position="4" start_page="667" end_page="667" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> Information Extraction (IE) is a sub-field of NLP, aims at aiding people to sift through large volume of documents by automatically identifying and tagging key entities, facts and events mentioned in the text.</Paragraph>
    <Paragraph position="1"> Over the years, much effort has been invested in developing accurate and efficient IE systems.</Paragraph>
    <Paragraph position="2"> Some of the systems are rule-based (Fisher, Soderland et al. 1995; Soderland 1999), some are statistical (Bikel, Miller et al. 1997; Collins and Miller 1998; Manning and Schutze 1999; Miller, Schwartz et al. 1999) and some are based on inductive-logic-based (Zelle and Mooney. 1996; Califf and Mooney 1998). Recent IE research with bootstrap learning (Brin 1998; Riloff and Jones 1999; Phillips and Riloff 2002; Thelen and Riloff 2002) or learning from documents tagged as relevant (Riloff 1996; Sudo, Sekine et al.</Paragraph>
    <Paragraph position="3"> 2001) has decreased, but not eliminated hand-tagged training.</Paragraph>
    <Paragraph position="4"> Snowball (Agichtein and Gravano 2000) is an unsupervised system for learning relations from document collections. The system takes as input a set of seed examples for each relation, and uses a clustering technique to learn patterns from the seed examples. It does rely on a full fledges Named Entity Recognition system. Snowball achieved fairly low precision figures (30-50%) on relations such as merger and acquisition on the same dataset used in our experiments.</Paragraph>
    <Paragraph position="5"> KnowItAll system is a direct predecessor of URES. It is developed at University of Washington by Oren Etzioni and colleagues (Etzioni, Cafarella et al. 2005). KnowItAll is an autonomous, domain-independent system that extracts facts from the Web. The primary focus of the system is on extracting entities (unary predicates). The input to KnowItAll is a set of entity classes to be extracted, such as &amp;quot;city&amp;quot;, &amp;quot;scientist&amp;quot;, &amp;quot;movie&amp;quot;, etc., and the output is a list of entities extracted from the Web. KnowItAll uses a set of manually-built generic rules, which are instantiated with the target predicate names, producing queries, patterns and discriminator phrases. The queries are passed to a search engine, the suggested pages are downloaded and processed with patterns. Every time a pattern is matched, the extraction is generated and evaluated using Web statistics - the number of search engine hits of the extraction alone and the extraction together with discriminator phrases.</Paragraph>
    <Paragraph position="6"> KnowItAll has also a pattern learning module (PL) that is able to learn patterns for extracting entities. However, it is unsuitable for learning patterns for relations. Hence, for extracting relations KnowItAll currently uses only the generic hand written patterns.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML