File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1008_intro.xml
Size: 4,439 bytes
Last Modified: 2025-10-06 14:02:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1008"> <Title>Task-focused Summarization of Email</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Data </SectionTitle> <Paragraph position="0"> We collected a corpus of 15,741 email messages. The messages were divided into training, development test and blind test. The training set contained 106,700 sentences in message bodies from 14,535 messages. To avoid overtraining to individual writing styles, we limited the number of messages from a given sender to 50. To ensure that our evaluations are indicative of performance on messages from previously unencountered senders, we selected messages from 3,098 senders, assigning all messages from a given sender to either the training or the test sets. Three human annotators labeled the message body sentences, selecting one tag from the following set: Salutation, Chit-chat (i.e., social discussion unrelated to the main purpose of the message), Task, Meeting (i.e., a proposal to meet), Promise, Farewell, various components of an email signature (Sig_Name, Sig_Title, Sig_Affiliation, Sig_Location, Sig_Phone, Sig_Email, Sig_URL, Sig_Other), and the default category &quot;None of the above&quot;. The set of tags can be considered a set of application-specific speech acts analogous to the rather particular tags used in the Verbmobil project, such as &quot;Suggest_exclude_date&quot; and uncommon in our corpus. Most senders were using Microsoft Outlook, which places the insertion point for new content at the top of the message.</Paragraph> <Paragraph position="1"> &quot;Motivate_appointment&quot; (Warnke et al., 1997; Mast et al., 1996) or the form-based tags of Stolcke et al. (1998).</Paragraph> <Paragraph position="2"> All three annotators independently labeled sentences in a separate set of 146 messages not included in the training, development or blind test sets. We measured inter-annotator agreement for the assignment of tags to sentences in the message bodies using Cohen's Kappa. Annotator 1 and annotator 2 measured 85.8%; annotator 1 and annotator 3 measured 82.6%; annotator 2 and annotator 3 measured 82.3%. We consider this level of inter-annotator agreement good for a novel set of application-specific tags.</Paragraph> <Paragraph position="3"> The development test and blind test sets of messages were tagged by all three annotators, and the majority tag for each sentence was taken. If any sentence did not have a majority tag, the entire message was discarded, leaving a total of 507 messages in the development test set and 699 messages in the blind test set.</Paragraph> <Paragraph position="4"> The set of tags was intended for a series of related experiments concerning linguistic processing of email. For example, greetings and chit-chat could be omitted from messages displayed on cell phones, or the components of an email signature could be extracted and stored in a contact database. In the current paper we focus exclusively on the identification of tasks.</Paragraph> <Paragraph position="5"> Annotators were instructed to mark a sentence as containing a task if it looked like an appropriate item to add to an on-going &quot;to do&quot; list. By this criterion, simple factual questions would not usually be annotated as tasks; merely responding with an answer fulfills any obligation. Annotators were instructed to consider the context of an entire message when deciding whether formulaic endings to email such as Let me know if you have any questions were to be interpreted as mere social convention or as actual requests for review and comment. The following are examples of actual sentences annotated as tasks in our data: Since Max uses a pseudo-random number generator, you could possibly generate the same sequence of numbers to select the same cases.</Paragraph> <Paragraph position="6"> Sorry, yes, you would have to retrain.</Paragraph> <Paragraph position="7"> An even fast [sic] thing would be to assign your own ID as a categorical feature.</Paragraph> <Paragraph position="8"> Michael, it'd be great if you could add some stuff re MSRDPS.</Paragraph> <Paragraph position="9"> Could you please remote desktop in and try running it on my machine.</Paragraph> <Paragraph position="10"> If CDDG has its own notion of what makes for good responses, then we should use that.</Paragraph> </Section> class="xml-element"></Paper>