File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/n04-1002_relat.xml

Size: 2,174 bytes

Last Modified: 2025-10-06 14:15:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1002">
  <Title>Cross-Document Coreference on a Large Scale Corpus</Title>
  <Section position="4" start_page="0" end_page="0" type="relat">
    <SectionTitle>
3. Related Work
TIPSTER Phase III first identified cross-document
</SectionTitle>
    <Paragraph position="0"> coreference as an area for research since it is a central tool to drive the process of producing summaries from multiple documents and for information fusion (Bagga and Baldwin, 1998). The Sixth Message Understanding Conference (MUC-6) identified cross-document coreference as a potential task but it was not included because it was considered to be too difficult (Bagga and Baldwin, 1998).</Paragraph>
    <Paragraph position="1"> ISOQuest's NetOwl and IBM's Textract attempted to determine whether multiple named entities refer to the same entity but neither had the ability to distinguish different entities with the same name. Entity detection and tracking looks at the same tasks as cross document coreferencing.</Paragraph>
    <Paragraph position="2"> Much of the work in this study is based on that by Bagga and Baldwin (1998), where they presented a successful cross-document coreference resolution algorithm to resolve ambiguities between people having the same name using the vector space model. We have implemented a simplified version of their algorithm that achieves roughly equivalent accuracy, but will show that the algorithm does not work as well when translated to a substantially larger corpus of documents.</Paragraph>
    <Paragraph position="3"> There has been significant work recently in the information extraction community on a problem known as Entity Detection and Tracking within the Automatic Content Extraction (ACE) evaluations (NIST 2003).</Paragraph>
    <Paragraph position="4"> That work includes an optional sub-task referred to alternately as either Entity Tracking or Entity Mention Detection. The goal is to pull together all mentions of the same entity across multiple documents. This task is a small and optional part of the complete ACE evaluation and results from it do not appear to be published.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML