File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1061_intro.xml

Size: 4,507 bytes

Last Modified: 2025-10-06 14:03:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1061">
  <Title>Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE</Title>
  <Section position="2" start_page="0" end_page="491" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Most research on text information extraction (IE) has focused on accurate tagging of named entities.</Paragraph>
    <Paragraph position="1"> Successful early named-entity taggers were based on finite-state generative models (Bikel et al., 1999).</Paragraph>
    <Paragraph position="2"> More recently, discriminatively-trained models have been shown to be more accurate than generative models (McCallum et al., 2000; Lafferty et al., 2001; Kudo and Matsumoto, 2001). Both kinds of models have been developed for tagging entities such as people, places and organizations in news material. However, the rapid development of bioinformatics has recently generated interest on the extraction of biological entities such as genes (Collier et al., 2000) and genomic variations (McDonald et al., 2004b) from biomedical literature.</Paragraph>
    <Paragraph position="3"> The next logical step for IE is to begin to develop methods for extracting meaningful relations involving named entities. Such relations would be extremely useful in applications like question answering, automatic database generation, and intelligent document searching and indexing. Though not as well studied as entity extraction, relation extraction has still seen a significant amount of work. We discuss some previous approaches at greater length in Section 2.</Paragraph>
    <Paragraph position="4"> Most relation extraction systems focus on the specific problem of extracting binary relations, such as the employee of relation or protein-protein interaction relation. Very little work has been done in recognizing and extracting more complex relations. We define a complex relation as any n-ary relation among n typed entities. The relation is defined by the schema (t1, . . . , tn) where ti 2 T are entity types. An instance (or tuple) in the relation is a list of entities (e1, . . . , en) such that either type(ei) = ti, or ei =? indicating that the ith element of the tuple is missing.</Paragraph>
    <Paragraph position="5"> For example, assume that the entity types are T = fperson, job, companyg and we are interested in the ternary relation with schema (person, job, company) that relates a person to their job at a particular company. For the sentence &amp;quot;John Smith is the CEO at Inc.</Paragraph>
    <Paragraph position="6"> Corp.&amp;quot;, the system would ideally extract the tuple (John Smith, CEO, Inc. Corp.). However, for the sentence &amp;quot;Everyday John Smith goes to his office at Inc. Corp.&amp;quot;, the system would extract (John Smith,?, Inc. Corp.), since there is no mention of a job title. Hence, the goal of complex relation extraction is to identify all instances of the relation of interest in some piece of text, including  incomplete instances.</Paragraph>
    <Paragraph position="7"> We present here several simple methods for extracting complex relations. All the methods start by recognized pairs of entity mentions, that is, binary relation instances, that appear to be arguments of the relation of interest. Those pairs can be seen as the edges of a graph with entity mentions as nodes. The algorithms then try to reconstruct complex relations by making tuples from selected maximal cliques in the graph. The methods are general and can be applied to any complex relation fitting the above definition. We also assume throughout the paper that the entities and their type are known a priori in the text. This is a fair assumption given the current high standard of state-of-the-art named-entity extractors.</Paragraph>
    <Paragraph position="8"> A primary advantage of factoring complex relations into binary relations is that it allows the use of standard classification algorithms to decide whether particular pairs of entity mentions are related. In addition, the factoring makes training data less sparse and reduces the computational cost of extraction.</Paragraph>
    <Paragraph position="9"> We will discuss these benefits further in Section 4.</Paragraph>
    <Paragraph position="10"> We evaluated the methods on a large set of annotated biomedical documents to extract relations related to genomic variations, demonstrating a considerable improvement over a reasonable baseline.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML