File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/02/j02-4004_relat.xml

Size: 2,942 bytes

Last Modified: 2025-10-06 14:15:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="J02-4004">
  <Title>c(c) 2002 Association for Computational Linguistics Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization</Title>
  <Section position="5" start_page="495" end_page="496" type="relat">
    <SectionTitle>
5. Discussion and Future Work
</SectionTitle>
    <Paragraph position="0"> Some problems that cause our algorithm to have difficulty, specifically proper nouns and anaphora resolution, need to be addressed. Proper nouns (people, organization, company, etc.) are often used in naturally occurring text, but since we have no information about them, we can only perform frequency counts on them. Anaphora resolution, especially in certain domains, is a bigger issue. Much better results are anticipated with the addition of anaphora resolution to the system.</Paragraph>
    <Paragraph position="1"> Other issues that may affect the results we obtained stem from WordNet's coverage and the semantic information it captures. Clearly, no semantically annotated lexicon can be complete. Proper nouns and domain-specific terms, as well as a number of other words likely to be in a document, are not found in the WordNet database. The system defaults to word frequency counts for terms not found. Semantic distance in the &amp;quot;is a&amp;quot; graph, a problem in WordNet, does not affect our implementation, since we don't use this information. It is important to note that although our system uses WordNet, there is nothing specific to the algorithm about WordNet per se, and any other appropriate lexicon could be &amp;quot;plugged in&amp;quot; and used.</Paragraph>
    <Paragraph position="2"> Issues regarding generation of a summary based on lexical chains need to be addressed and are the subject of our current work. Recent research has begun to look at the difficult problem of generating a summary text from an intermediate representation. Hybrid approaches such as extracting phrases instead of sentences and recombining these phrases into salient text have been proposed (Barzilay, McKeown, and Elhadad 1999). Other recent work looks at summarization as a process of revision; in this work, the source text is revised until a summary of the desired length is achieved (Mani, Gates, and Bloedorn 1999). Additionally, some research has explored cutting and pasting segments of text from the full document to generate a summary (Jing and McKeown 2000). It is our intention to use lexical chains as part of the input to a more classical text generation algorithm to produce new text that captures the concepts from the extracted chains. The lexical chains identify noun (or argument) concepts for the  Computational Linguistics Volume 28, Number 4 summary. We are examining ways for predicates to be identified and are concentrating on situations in which strong lexical chains intersect in the text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML