File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/e95-1037_intro.xml

Size: 3,205 bytes

Last Modified: 2025-10-06 14:05:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="E95-1037">
  <Title>Topic Identification in Discourse</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Although only speakers and writers instead of texts have topics (Brown and Yule, 1983, p. 68), natural language researchers always want to identify a topic or a set of possible topics from a discourse for further applications, such as anaphora resolution, information retrieval and so on. This paper adopts a corpus-based approach to process discourse information. We postulate that: (1) Topic is coherent and has strong relationships with the events in the discourse.</Paragraph>
    <Paragraph position="1"> Now, consider the following example quoted from the Lancaster-Oslo/Bergen (LOB) Corpus (Johansson, 1986). The topics in this example are &amp;quot;problem&amp;quot; and &amp;quot;dislocation&amp;quot;. The two words are more strongly related to the verbs (&amp;quot;explain&amp;quot;, &amp;quot;fell&amp;quot;, &amp;quot;placing&amp;quot; and &amp;quot;suppose&amp;quot;) and nouns (&amp;quot;theories&amp;quot;, &amp;quot;explanations&amp;quot;, &amp;quot;roll&amp;quot;, &amp;quot;codex&amp;quot;, &amp;quot;disorder&amp;quot;, &amp;quot;order&amp;quot;, &amp;quot;disturbance&amp;quot; and &amp;quot;upheaval&amp;quot;).</Paragraph>
    <Paragraph position="2"> There is a whole group of theories which attempt to explain the problems of the Fourth Gospel by explanations based on assumed textual dislocations. The present state of the Gospel is the result of an accident-prone history. The original was written on a roll, or codex, which fell into disorder or was accidentally damaged. An editor, who was not the author, made what he could of the chaos by placing the fragments, or sheets, or pages, in order. Most of those who expound a theory of textual dislocation take it for granted that the Gospel was written entirely by one author before the disturbance took place but a few leave it open to suppose that the original book had been revised even before the upheaval.</Paragraph>
    <Paragraph position="3"> We also postulate that (2) Noun-verb is a predicate-argument relationship on the sentence level and noun-noun relationship is associated on discourse level.</Paragraph>
    <Paragraph position="4"> The postulation (2) could be also observed from the above example. These relationships may be represented implicitly by collocational semantics.</Paragraph>
    <Paragraph position="5"> Collocation has been applied successfully to many possible applications (Church et al. , 1989), e.g, lexicography (Church and Hanks, 1990), information retrieval (Salton, 1986a), text input (Yamashina and Obashi, 1988), etc. This paper will touch on its feasibility in topic identification.</Paragraph>
    <Paragraph position="6"> This paper is organized as follows. Section 2 presents a corpus-based language model and discuss how to train this model. Section 3 touches on topic identification in discourse. Section 4 shows a series of experiments based on the proposed model and discusses the results. Section 5 gives short remarks.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML