File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1105_intro.xml

Size: 2,959 bytes

Last Modified: 2025-10-06 14:03:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1105">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Comparison of Similarity Models for the Relation Discovery Task</Title>
  <Section position="4" start_page="25" end_page="25" type="intro">
    <SectionTitle>
5 presents results and statistical analysis.
2 The Relation Discovery Task
</SectionTitle>
    <Paragraph position="0"> Conventionally, relation extraction is considered to be part of information extraction and has been approached through supervised learning or rule engineering (e.g., Blaschke and Valencia (2002), Bunescu and Mooney (2005)). However, traditional approaches have several shortcomings. First 1The relation discovery task is minimally supervised in the sense that it relies on having certain resources such as named entity recognition. The focus of the current paper is the unsupervised task of clustering relations.</Paragraph>
    <Paragraph position="1"> and foremost, they are generally based on pre-defined templates of what types of relations exist in the data and thus only capture information whose importance was anticipated by the template designers. This poses reliability problems when predicting new data in the same domain as the training data will be from a certain epoch in the past. Due to language change and topical variation, as time passes, it is likely that the new data will deviate more and more from the trained models. Additionally, there are cost problems associated with the conventional supervised approach when updating templates or transferring to a new domain, both of which require substantial effort in re-engineering rules or re-annotating training data.</Paragraph>
    <Paragraph position="2"> The goal of the relation discovery task is to identify the existence of associations between entities, to identify the kinds of relations that occur in a corpus and to annotate particular associations with relation types. These goals correspond to the three main steps in a generalised algorithm (Hasegawa et al., 2004):  1. Identify co-occurring pairs of named entities 2. Group entity pairs using the textual context 3. Label each cluster of entity pairs  The first step is the relation identification task.</Paragraph>
    <Paragraph position="3"> In the current work, this is assumed to have been done already. We use the gold standard relations in the ACE data in order to isolate the performance of the second step. The second step is a clustering task and as such it is necessary to compute similarity between the co-occurring pairs of named entities (relations). In order to do this, a model of relation similarity is required, which is the focus of the current work.</Paragraph>
    <Paragraph position="4"> We also assume that it is possible to perform the third step.2 The evaluation we present here looks just at the quality of the clustering and does not attempt to assess the labelling task.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML