File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/p92-1028_intro.xml

Size: 3,300 bytes

Last Modified: 2025-10-06 14:05:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1028">
  <Title>CORPUS-BASED ACQUISITION OF RELATIVE PRONOUN DISAMBIGUATION HEURISTICS</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> State-of-the-art natural language processing (NLP) systems typically rely on heuristics to resolve many classes of ambiguities, e.g., prepositional phrase attachment, part of speech disambiguation, word sense disambiguation, conjunction, pronoun resolution, and concept activation. However, the manual encoding of these heuristics, either as part of a formal grammar or as a set of disarnbiguation rules, is difficult because successful heuristics demand the assimilation of complex syntactic and semantic knowledge. Consider, for example, the problem of prepositional phrase attachment. A number of purely structural solutions have been proposed including the theories of Minimal Attachment (Frazier, 1978) and Right Association (Kimball, 1973). While these models may suggest the existence of strong syntactic preferences in effect during sentence understanding, other studies provide clear evidence that purely syntactic heuristics for prepositional phrase attachment will not work (see (Whittemore, Ferrara, &amp; Brunner, 1990), (Taraban, &amp; McClelland, 1988)).</Paragraph>
    <Paragraph position="1"> However, computational linguists have found the manual encoding of disarnbiguation rules -especially those that merge syntactic and semantic constraints -- to be difficult, time-consuming, and prone to error. In addition, hand-coded heuristics are often incomplete and perform poorly in new domains comprised of specialized vocabularies or a different genre of text.</Paragraph>
    <Paragraph position="2"> In this paper, we focus on a single ambiguity in sentence processing: locating the antecedents of relative pronouns. We present an implemented corpus-based approach for the automatic acquisition of disambiguation heuristics for that task. The technique uses an existing hierarchical clustering system to determine the antecedent of a relative pronoun given a description of the clause that precedes it and requires only minimal syntactic parsing capabilities and a very general semantic feature set for describing nouns.</Paragraph>
    <Paragraph position="3"> Unlike other corpus-based techniques, only a small number of training examples is needed, making the approach practical even for small to medium-sized on-line corpora. For the task of relative pronoun disambignation, the automated approach duplicates the performance of hand-coded rules and makes it possible to compile heuristics tuned to a new corpus with little human intervention. Moreover, we believe that the technique may provide a general approach for the automated acquisition of disambiguation heuristics for additional problems in natural language processing.</Paragraph>
    <Paragraph position="4"> In the next section, we briefly describe the task of relative pronoun disambiguation. Sections 3 and 4 give the details of the acquisition algorithm and evaluate its performance. Problems with the approach and extensions required for use with large corpora of unrestricted text are discussed in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML