File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1612_concl.xml

Size: 3,424 bytes

Last Modified: 2025-10-06 13:55:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1612">
  <Title>Learning Information Status of Discourse Entities</Title>
  <Section position="10" start_page="100" end_page="100" type="concl">
    <SectionTitle>
7 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> We have presented a model for the automatic assignment of information status in English. On the three-way classification into old, mediated, and new that reflects the corpus annotation tags, the learnt tree outperforms a hand-crafted algorithm and achieves an accuracy of 79.5%, with high precision and recall for old entities, high recall for mediated entities, and a fair precision, but very poor recall, for new ones. When we collapsed mediated and new entities into one category only opposing this to old ones, the classifier performed with an accuracy of 93.1%, with high f-scores for both classes. Binning mediated and old entities together did not produce interesting results, mainly due to the highly skewed distribution of the resulting corpus towards old entities. This suggests that mediated entities are more similar to new than to old ones, and might provide interesting feedback for the theoretical assumptions underlying the annotation. Future work will examine specific cases and investigate how such insights can be used to make the theoretical framework more accurate.</Paragraph>
    <Paragraph position="1"> As the first experiments run on English to learn information status, we wanted to concentrate on the task itself and avoid noise introduced by automatic processing. More realistic settings for integrating an information status model in a largescaleNLPsystemwouldimplyobtainingsyntactic null information via parsing rather than directly from the treebank. Future experiments will assess the impact of automatic preprocessing of the data.</Paragraph>
    <Paragraph position="2"> Results are very promising but there is room for improvement. First, the syntactic category &amp;quot;other&amp;quot; isfartoolarge, andfinerdistinctionsmustbemade by means of better extraction rules from the trees.</Paragraph>
    <Paragraph position="3"> Second, and most importantly, we believe that usingmorefeatureswillbethemaintriggerofhigher null accuracy. In particular, we plan to use additional lexical and relational features derived from knowledge sources such as WordNet (Fellbaum, 1998) and FrameNet (Baker et al., 1998) which should be especially helpful in distinguishing mediated from new entities, the most difficult decision to make. For example, an entity that is linked in WordNet (within a given depth) and/or FrameNet to a previously introduced one is more likely to be mediated than new.</Paragraph>
    <Paragraph position="4"> Additionally, we will attempt to exploit dialogue turns, since knowing which speaker said what is clearly very valuable information. In a similar vein, we will experiment with distance measures, in terms of turns, sentences, or even time, for determining when an introduced entity might stop to be available.</Paragraph>
    <Paragraph position="5"> We also plan to run experiments on the automatic classification of old and mediated subtypes (the finer-grained classification) that is included in the corpus but that we did not consider for the present study (see Section 2.1). The major benefit of this would be a contribution to the resolution of bridging anaphora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML