File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2086_intro.xml

Size: 4,057 bytes

Last Modified: 2025-10-06 14:03:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2086">
  <Title>URES : an Unsupervised Web Relation Extraction System</Title>
  <Section position="3" start_page="0" end_page="667" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The most common preprocessing technique for text mining is information extraction (IE). It is defined as the task of extracting knowledge out of textual documents. In general, IE is divided into two main types of extraction tasks - Entity tagging and Relation extraction.</Paragraph>
    <Paragraph position="1"> The main approaches used by most information extraction systems are the knowledge engineering approach and the machine learning approach. The knowledge engineering (mostly rule based) systems traditionally were the top performers in most IE benchmarks, such as MUC (Chinchor, Hirschman et al. 1994), ACE and the KDD CUP (Yeh and Hirschman 2002).</Paragraph>
    <Paragraph position="2"> Recently though, the machine learning systems became state-of-the-art, especially for simpler tagging problems, such as named entity recognition (Bikel, Miller et al. 1997), or field extraction (McCallum, Freitag et al. 2000). The general idea is that a domain expert labels the target concepts in a set of documents. The system then learns a model of the extraction task, which can be applied to new documents automatically. null Both of these approaches require massive human effort and hence prevent information extraction from becoming more widely applicable. In order to minimize the huge manual effort involved with building information extraction systems, we have designed and developed URES</Paragraph>
    <Section position="1" start_page="0" end_page="667" type="sub_section">
      <SectionTitle>
(Unsupervised Relation Extraction System)
</SectionTitle>
      <Paragraph position="0"> which learns a set of patterns to extract relations from the web in a totally unsupervised way. The system takes as input the names of the target relations, the types of its arguments, and a small set of seed instances of the relations. It then uses a large set of unlabeled documents downloaded from the Web in order to build extraction patterns. URES patterns currently have two modes of operation. One is based upon a generic shallow parser, able to extract noun phrases and their  heads. Another mode builds patterns for use by TEG (Rosenfeld, Feldman et al. 2004). TEG is a hybrid rule-based and statistical IE system. It utilizes a trained labeled corpus in order to complement and enhance the performance of a relatively small set of manually-built extraction rules. When it is used with URES, the relation extraction rules and training data are not built manually but are created automatically from the URES-learned patterns. However, URES does not built rules and training data for entity extraction. For those, we use the grammar and training data we developed separately.</Paragraph>
      <Paragraph position="1"> It is important to note that URES is not a classic IE system. Its purpose is to extract as many as possible different instances of the given relations while maintaining a high precision. Since the goal is to extract instances and not mentions, we are quite willing to miss a particular sentence containing an instance of a target relation - if the instance can be found elsewhere. In contrast, the classical IE systems extract mentions of entities and relations from the input documents. This difference in goals leads to different ways of measuring the performance of the systems.</Paragraph>
      <Paragraph position="2"> The rest of the paper is organized as follows: in Section 2 we present the related work. In Section 3 we outline the general design principles of URES and the architecture of the system and then describe the different components of URES in details while giving examples to the input and output of each component. In Section 4 we present our experimental evaluation and then wrap up with conclusions and suggestions for future work.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML