File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1041_intro.xml
Size: 4,912 bytes
Last Modified: 2025-10-06 14:01:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1041"> <Title>Information Extraction from Voicemail Transcripts</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> When you're away from the phone and someone takes a message for you, at the very least you'd expect to be told who called and whether they left a number for you to call back. If the same call is picked up by a voicemail system, even such basic information like the name of the caller and their phone number may not be directly available, forcing one to listen to the entire message1 in the worst case. By contrast, information about the sender of an email message has always been explicitly represented in the message headers, starting with early standardization attempts (Bhushan et al., 1973) and including the two decade old current standard (Crocker, 1982).</Paragraph> <Paragraph position="1"> Applications that aim to present voicemail messages through an email-like interface - take as an example the idea of a &quot;uniform inbox&quot; presentation of email, voicemail, and other kinds of messages2 - must deal with the problem of how to obtain information analogous to what would be contained in email headers.</Paragraph> <Paragraph position="2"> Here we will discuss one way of addressing this problem, treating it exclusively as the task of extracting relevant information from voicemail transcripts. In practice, e.g. in the context of a sophisticated voicemail front-end (Hirschberg et al., 2001) that is tightly integrated with an organization-wide voicemail system and private branch exchange (PBX), additional sources of information may be available: the voicemail system or the PBX might provide information about the originating station of a call, and speaker identification can be used to match a caller's voice against models of known callers (Rosenberg et al., 2001). Restricting our attention to voicemail transcripts means that our focus and goals are similar to those of Huang et al. (2001), but the features and techniques we use are very different.</Paragraph> <Paragraph position="3"> While the present task may seem broadly similar to named entity extraction from broadcast news (Gotoh and Renals, 2000), it is quite distinct from the latter: first, we are only interested in a small subset of the named entities; and second, the structure of the voicemail transcripts in our corpus is very different from broadcast news and certain aspects of this structure can be exploited for extracting caller names.</Paragraph> <Paragraph position="4"> Huang et al. (2001) discuss three approaches: hand-crafted rules; grammatical inference of subsequential transducers; and log-linear classifiers with bigram and trigram features used as taggers (Ratnaparkhi, 1996). While the latter are reported to yield the best overall performance, the hand-crafted rules resulted in higher recall. Our phone number extractor is based on a two-phase procedure that employs a small hand-crafted component to propose candidate phrases, followed by a classifier that retains the desirable candidates. This allows for more or less inde-Association for Computational Linguistics.</Paragraph> <Paragraph position="5"> Language Processing (EMNLP), Philadelphia, July 2002, pp. 320-327. Proceedings of the Conference on Empirical Methods in Natural pendent optimization of recall and precision, somewhat similar to the PNrule classifier learner (Agarwal and Joshi, 2001; Joshi et al., 2001). We shall see that hand-crafted rules achieve very good recall, just as Huang et al. (2001) had observed, and the pruning phase successfully eliminates most undesirable candidates without affecting recall too much. Over-all performance of our method is better than if we employ a log-linear model with trigram features.</Paragraph> <Paragraph position="6"> The success of the method proposed here is also due to the use of a rich set of features for candidate classification. For example, the majority of phone numbers in voicemail messages has either four, seven, or ten digits, whereas nine digits would indicate a social security number. In our two-phase approach it is straightforward for the second-phase classifier to take the length of a candidate phone number into account. On the other hand, standard named entity taggers that use trigram features do not exploit this information, and doing so would entail significant changes to the underlying models and parameter estimation procedures.</Paragraph> <Paragraph position="7"> The rest of this paper is organized as follows. A brief overview of the data we used in SS2 is followed by a discussion of methods for extracting two kinds of caller information in SS3. Methods for extracting telephone numbers are discussed in SS4, and SS5 summarizes and concludes.</Paragraph> </Section> class="xml-element"></Paper>