File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1008_intro.xml

Size: 2,706 bytes

Last Modified: 2025-10-06 14:01:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1008">
  <Title>Using decision trees to select the gran natical relation of a noun phrase</Title>
  <Section position="4" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Natural language generation involves a number of processes ranging from planning the content to be expressed through making encoding decisions involving syntax, the lexicon and morphology. The present study concerns decisions made about the form and distribution of each &amp;quot;mention&amp;quot; of a discourse entity: should reference be made with a lexical NP, a pronominal NP or a zero anaphor (i.e. an elided mention)? Should a given mention be expressed as the subject of its clause or in some other grammatical relation? If all works well, a natural language generation system may end up proposing a mmaber of possible well-formed expressions of the same propositional content. Although these possible formulations would all be judged to be valid sentences of the target language, it is not the ease that they are all equally likely to occur.</Paragraph>
    <Paragraph position="1"> Research in the area of Preferred Argument Structure (Corston 1996, Du Bois 1987) has established that in discourse in many languages, including English, NPs are distributed across grammatical relations in statistically significant ways. For example, transitive clauses tend not to contain lexical NPs in both subject and object positions and subjects of transitives tend not to be lexical NPs nor to be discourse-new.</Paragraph>
    <Paragraph position="2"> Unfortunately, the models used in PAS have involved only simple chi-squared tests to identify statistically significant patterns in the distribution of NPs with respect to pairs of features (e.g. part of speech and grammatical relation). A further problem from the point of view of computational discourse analysis is that many of the features used in empirical studies are not observable in texts using state-of-the art natural language processing. Such non-observable features include animacy, the information status of a referent, and the identification of the gender of a referent based on world knowledge.</Paragraph>
    <Paragraph position="3"> In the present study, we treat the task of determining the appropriate distribution of mentions in text as a machine learning classification problem: what is the probability that a mention will have a certain grammatical relation given a deh set of linguistic features? In particular, how accurately can we select appropriate grammatical relations using only superficial linguistic features?</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML