File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1025_intro.xml

Size: 4,500 bytes

Last Modified: 2025-10-06 14:03:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1025">
  <Title>Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution</Title>
  <Section position="2" start_page="0" end_page="192" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The last years have seen a boost of work devoted to the development of machine learning based coreference resolution systems (Soon et al., 2001; Ng &amp; Cardie, 2002; Yang et al., 2003; Luo et al., 2004, inter alia). While machine learning has proved to yield performance rates fully competitive with rule based systems, current coreference resolution systems are mostly relying on rather shallow features, such as the distance between the coreferent expressions, string matching, and linguistic form. However, the literature emphasizes since the very beginning the relevance of world knowledge and inference for coreference resolution (Charniak, 1973).</Paragraph>
    <Paragraph position="1"> This paper explores whether coreference resolution can benefit from semantic knowledge sources.</Paragraph>
    <Paragraph position="2"> More specifically, whether a machine learning based approach to coreference resolution can be improved and which phenomena are affected by such information. We investigate the use of the WordNet and Wikipedia taxonomies for extracting semantic similarity and relatedness measures, as well as semantic parsing information in terms of semantic role labeling (Gildea &amp; Jurafsky, 2002, SRL henceforth).</Paragraph>
    <Paragraph position="3"> We believe that the lack of semantics in the current systems leads to a performance bottleneck.</Paragraph>
    <Paragraph position="4"> In order to correctly identify the discourse entities which are referred to in a text, it seems essential to reason over the lexical semantic relations, as well as the event representations embedded in the text. As an example, consider a fragment from the Automatic Content Extraction (ACE) 2003 data.</Paragraph>
    <Paragraph position="5"> (1) But frequent visitors say that given the sheer weight of the country's totalitarian ideology and generations of mass indoctrination, changing this country's course will be something akin to turning a huge ship at sea. Opening North Korea up, even modestly, and exposing people to the idea that Westerners - and South Koreans - are not devils, alone represents an extraordinary change. [...] as his people begin to get a clearer idea of the deprivation they have suffered, especially relative to their neighbors. &amp;quot;This is a society that has been focused most of all on stability, [...]&amp;quot;.</Paragraph>
    <Paragraph position="6"> In order to correctly resolve the anaphoric expressions highlighted in bold, it seems that some kind of lexical semantic and encyclopedic knowledge is required. This includes that North Korea is a country, that countries consist of people and are societies. The resolution requires an encyclopedia (i.e.</Paragraph>
    <Paragraph position="7"> Wikipedia) look-up and reasoning on the content relatedness holding between the different expressions (i.e. as a path measure along the links of the Word-Net and Wikipedia taxonomies). Event representations seem also to be important for coreference resolution, as shown below: (2) A state commission of inquiry into the sinking of the Kursk will convene in Moscow on Wednesday, the Interfax news agency reported. It said that the diving operation will be completed by the end of next week.</Paragraph>
    <Paragraph position="8">  In this example, knowing that the Interfax news agency is the AGENT of the report predicate and It being the AGENT of say could trigger the (semantic parallelism based) inference required to correctly link the two expressions, in contrast to anchoring the pronoun to Moscow. SRL provides the semantic relationships that constituents have with predicates, thus allowing us to include such document-level event descriptive information into the relations holding between referring expressions (REs).</Paragraph>
    <Paragraph position="9"> Instead of exploring different kinds of data representations, task definitions or machine learning techniques (Ng &amp; Cardie, 2002; Yang et al., 2003; Luo et al., 2004) we focus on a few promising semantic features which we evaluate in a controlled environment. That way we try to overcome the plateauing in performance in coreference resolution observed by Kehler et al. (2004).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML