File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4027_intro.xml
Size: 3,270 bytes
Last Modified: 2025-10-06 14:02:17
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4027"> <Title>Summarizing Email Threads</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Previous and Related Work </SectionTitle> <Paragraph position="0"> Muresan et al. (2001) describe work on summarizing individual email messages using machine learning approaches to learn rules for salient noun phrase extraction.</Paragraph> <Paragraph position="1"> In contrast, our work aims at summarizing whole threads and at capturing the interactive nature of email.</Paragraph> <Paragraph position="2"> Nenkova and Bagga (2003) present work on generating extractive summaries of threads in archived discussions. A sentence from the root message and from each response to the root extracted using ad-hoc algorithms crafted by hand. This approach works best when the sub-ject of the root email best describes the &quot;issue&quot; of the thread, and when the root email does not discuss more than one issue. In our work, we do not make any assumptions about the nature of the email, and learn sentence extraction strategies using machine learning.</Paragraph> <Paragraph position="3"> Newman and Blitzer (2003) also address the problem of summarizing archived discussion lists. They cluster messages into topic groups, and then extract summaries for each cluster. The summary of a cluster is extracted using a scoring metric based on sentence position, lexical similarity of a sentence to cluster centroid, and a feature based on quotation, among others. While the approach is quite different from ours (due to the underlying clustering algorithm and the absence of machine learning to select features), the use of email-specific features, in particular the feature related to quoted material, is similar.</Paragraph> <Paragraph position="4"> Lam et al. (2002) present work on email summarization by exploiting the thread structure of email conversation and common features such as named entities and dates. They summarize the message only, though the content of the message to be summarized is &quot;expanded&quot; using the content from its ancestor messages. The expanded message is passed to a document summarizer which is used as a black box to generate summaries. Our work, in contrast, aims at summarizing the whole thread, and we are precisely interested in changing the summarization algorithm itself, not in using a black box summarizer.</Paragraph> <Paragraph position="5"> In addition, there has been some work on summarizing meetings. As discussed in Section 1, email is different in important respects from (multi-party) dialog. However, some important aspects are related. Zechner (2002), for example, presents a meeting summarization system which uses the MMR algorithm to find sentences that are most similar to the segment and most dissimilar to each other. The similarity weights in the MMR algorithm are modified using three features, including whether a sentence belongs to a question-answer pair. The use of the question-answer pair detection is an interesting proposal that is also applicable to our work. However, overall most of the issues tackled by Zechner (2002) are not relevant to email summarization.</Paragraph> </Section> class="xml-element"></Paper>