XML Viewer - p98-1040

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1040_metho.xml
Size: 23,546 bytes
Last Modified: 2025-10-06 14:14:54
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1040">
  <Title>Dialogue Management in Vector-Based Call Routing</Title>
  <Section position="4" start_page="0" end_page="256" type="metho">
    <SectionTitle>
3 Corpus Analysis
</SectionTitle>
    <Paragraph position="0"> To examine human-human dialogue behavior in call routing, we analyzed a set of 4497 transcribed telephone calls involving customers interacting with human operators, looking at both the semantics of caller requests and  dialogue actions for response generation. The call center provides financial services in hundreds of categories in the general areas of banking, credit cards, loans, insurance and investments; we concentrated on the 23 destinations for which we had at least 10 calls in the corpus.</Paragraph>
    <Section position="1" start_page="256" end_page="256" type="sub_section">
      <SectionTitle>
3.1 Semantics of Caller Requests
</SectionTitle>
      <Paragraph position="0"> The operator provides an open-ended prompt of &amp;quot;How may I direct your call?&amp;quot; We classified user responses into three categories. First, callers may explicitly provide a destination name, either by itself or embedded in a complete sentence, such as &amp;quot;may I have consumer lending?&amp;quot; Second, callers may describe the activity they would like to perform. Such requests may be unambiguous, such as &amp;quot;l'd like my checking account balance&amp;quot;, or ambiguous, such as &amp;quot;car loans please&amp;quot;, which in our call center can be resolved to either consumer lending, which handles new car loans, or to loan services, which handles existing car loans. Third, a caller can provide an indirect request, in which they describe their goal in a roundabout way, often including irrelevant information. This often occurs when the caller either is unfamiliar with the call center hierarchy or does not have a concrete idea of how to achieve the goal, as in &amp;quot;ah I'm calling 'cuz ah a friend gave me this number and ah she told me ah with this number I can buy some cars or whatever but she didn't know how to explain it to me so l just called you you know to get that information.&amp;quot; Table 1 shows the distribution of caller requests in our corpus with respect to these semantic types. Our analysis shows that in the vast majority of calls, the request was based on destination name or activity. Since there is a fairly small number (dozens to hundreds) of activities being handled by each destination, requests based on name and activity are expected to be more predictable and thus more suitable for handling by an automatic call router.</Paragraph>
      <Paragraph position="1"> Thus, our goal is to automatically route those calls based on name and activity, while leaving the indirect or inappropriate requests to human call operators.</Paragraph>
    </Section>
    <Section position="2" start_page="256" end_page="256" type="sub_section">
      <SectionTitle>
3.2 Dialogue Actions for Response Generation
</SectionTitle>
      <Paragraph position="0"> We also analyzed the operator's responses to caller requests to determine the dialogue actions needed for response generation in our automatic call router. We found that in the call routing task, the call operator either notifies the customer of the routing destination or asks a disambiguating query.l lln cases where the operator generates an acknowledgment, such as uh-huh, midway through the caller's request, we analyzed the next operator utterance.</Paragraph>
      <Paragraph position="1">  Table 2 shows the frequency that each dialogue action should be employed based strictly on the presence of ambiguity in the caller requests in our corpus. We further analyzed those calls considered ambiguous within our call center and noted that 75% of such ambiguous requests involve underspecified noun phrases, such as requesting car loans without specifying whether it is an existing or new car loan. The remaining 25% of the ambiguous requests involve underspecified verb phrases, such as asking to transfer funds without specifying the types of accounts to and from which the transfer will occur, or missing verb phrases, such as asking for direct deposit without specifying whether the caller wants to set up or change an existing direct deposit.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="256" end_page="259" type="metho">
    <SectionTitle>
4 Dialogue Management in Call Routing
</SectionTitle>
    <Paragraph position="0"> Our call router consists of two components: the routing module and the disambiguation module. The routing module takes a caller request and determines a set of destinations to which the call can reasonably be routed.</Paragraph>
    <Paragraph position="1"> If there is exactly one such destination, the call is routed there and the customer notified; if there are multiple destinations, the disambiguation module is invoked in an attempt to formulate a query; and if there is no appropriate destination or if a reasonable disambiguation query cannot be generated, the call is routed to an operator. Figure I shows a diagram outlining this process.</Paragraph>
    <Section position="1" start_page="256" end_page="258" type="sub_section">
      <SectionTitle>
4.1 The Routing Module
</SectionTitle>
      <Paragraph position="0"> Our approach is novel in its application of information retrieval techniques to select candidate destinations for a call. We treat call routing as an instance of document routing, where a collection of judged documents is used for training and the task is to judge the relevance of a set of test documents (Schiitze et al., 1995). More specifi- null cally, each destination in our call center is represented as a collection of documents (transcriptions of calls routed to that destination), and given a caller request, we judge the relevance of the request to each destination.</Paragraph>
      <Paragraph position="1">  Document Construction Our training corpus consists of 3753 calls each of which is hand-routed to one of 23 destinations. 2 Our first step is to create one (virtual) document per destination, which contains the text of the callers' contributions to all calls routed to that destination. null Morphological Filtering We filter each (virtual) document through the morphological processor of the Bell Labs' Text-to-Speech synthesizer (Sproat, 1997) to extract the root form of each word in the corpus. Next, the root forms of caller utterances are filtered through two lists, the ignore list and the stop list, in order to build a better n-gram model. The ignore list consists of noise words, such as uh and um, which sometimes get in the way of proper n-gram extraction, as in &amp;quot;I'd like to speak to someone about a car uh loan&amp;quot;. With noise word filtering, we can properly extract the bigram &amp;quot;car, loan&amp;quot;. The stop list enumerates words that do not discriminate between destinations, such as the, be, and afternoon. We modified the standard stop list distributed with the SMART information retrieval system (Salton, 1971) to include domain specific terms and proper names that occurred in the training corpus. Note that when a stop word is filtered out of the caller utterance, a place-holder is inserted to prevent the words preceding and following the stop word to form n-grams. For instance, after filtering the stop words out of &amp;quot;I want to check on an account&amp;quot;, the utterance becomes &amp;quot;&lt;sw&gt; &lt;sw&gt; &lt;sw&gt; check &lt;sw&gt; &lt;sw&gt; account&amp;quot;. Without the placeholders, we would extract the bigram &amp;quot;check, account&amp;quot;, just as if the caller had used the term checking account.</Paragraph>
      <Paragraph position="2"> Term Extraction We extract the n-gram terms that occur more frequently than a pre-determined threshold and do not contain any stop words. Our current system uses unigrams that occurred at least twice and bigrams and trigrams that occurred at least three times in the corpus.</Paragraph>
      <Paragraph position="3"> No 4-grams occurred three times.</Paragraph>
      <Paragraph position="4"> Term-Document Matrix Once the set of relevant terms is determined, we construct an m x n term-document frequency matrix A whose rows represent the m terms, whose columns represent the n destinations, and where an entry At,a is the frequency with which term t occurs in calls to destination d.</Paragraph>
      <Paragraph position="5"> It is often advantageous to weight the raw counts to fine tune the contribution of each term to routing.</Paragraph>
      <Paragraph position="6"> We begin by normalizing the row vectors representing terms by making them each of unit length. Thus we divide each row At in the original matrix by its length, 2These 3753 calls are a subset of the corpus of 4497 calls used in our corpus analysis. We excluded those ambiguous calls that were not resolved by the operator.</Paragraph>
      <Paragraph position="7"> A 2 1/2 (El&lt;e&lt;n t,e) . Our second weighting is based on the n-oti-on that a term that only occurs in a few documents is more important in discriminating among documents than a term that occurs in nearly every document. We use the inverse document frequency (IDF) weighting scheme (Sparck Jones, 1972) whereby a term is weighted inversely to the number of documents in which it occurs, by means oflDF(t) = log 2 n/d(t) where t is a term, n is the total number of documents in the corpus, and d(t) is the number of documents containing the term t. Thus we obtain a weighted matrix B, whose elements are given by Bt,a = At,a x IDF(t)/(~-~x&lt;e&lt; n A2,e)x/2.</Paragraph>
      <Paragraph position="8"> Vector Representation To reduce the dimensionality of our vector representations for terms and documents, we applied the singular value decomposition (Deerwester et al., 1990) to the m x n matrix B of weighted term-document frequencies. Specifically, we take B = USV T, where U is an m x r orthonormal matrix (where r is the rank of B), V is an n x r orthonormal matrix, and S is an r x r diagonal matrix such that Sl,1 ~_~ 82,2 ~&gt; &amp;quot;'&amp;quot; ~&gt; Sr,r ~ O.</Paragraph>
      <Paragraph position="9"> We can think of each row in U as an r-dimensional vector that represents a term, whereas each row in V is an r-dimensional vector representing a document. With appropriate scaling of the axes by the singular values on the diagonal of S, we can compare documents to documents and terms to terms using their corresponding points in this new r-dimensional space (Deerwester et al., 1990). For instance, to employ the dot product of two vectors as a measure of their similarity as is common in information retrieval (Salton, 1971), we have the matrix BTB whose elements contain the dot product of document vectors. Because S is diagonal and U is orthonormal, BTB = VSZV T = VS(VS) T. Thus, element i, j in BTB, representing the dot product between document vectors i and j, can be computed by taking the dot product between the i and j rows of the matrix VS. In other words, we can consider rows in the matrix VS as vectors representing documents for the purpose of document/document comparison. An element of the original matrix Bi,j, representing the degree of association between the ith term and the jth document, can be recovered by multiplying the ith term vector by the jth scaled document vector, namely Bij = Ui((VS)j) T.</Paragraph>
      <Paragraph position="10">  Given the vector representations of terms and documents (destinations) in r-dimensional space, how do we determine to which destination a new call should be routed? Our process for vector-based call routing consists of the following four steps: Term Extraction Given a transcription of the caller's utterance (either from a keyboard interface or from the output of a speech recognizer), the first step is to extract the relevant n-gram terms from the utterance. For instance, term extraction on the request &amp;quot;I want to check the balance in my savings account&amp;quot; would result in  one bigram term, &amp;quot;saving, account&amp;quot;, and two unigrams, &amp;quot;check&amp;quot; and &amp;quot;balance&amp;quot;.</Paragraph>
      <Paragraph position="11">  terms from a caller request, we can represent the request as an m-dimensional vector Q where each component Qi represents the number of times that the ith term occurred in the caller's request. We then create an r-dimensional pseudo-document vector D = QU, following the standard methodology of vector-based information retrieval (see (Deerwester et al., 1990)). Note that D is simply the sum of the term vectors Ui for all terms occurring in the caller's request, weighted by their frequency of occurrence in the request, and is scaled properly for document/document comparison.</Paragraph>
      <Paragraph position="12"> Scoring Once the vector D for the pseudo-document is determined, we compare it with the document vectors by computing the cosine between D and each scaled document vectors in VS. Next, we transform the cosine score for each destination using a sigmoid function specifically fitted for that destination to obtain a confidence score that represents the router's confidence that the call should be routed to that destination.</Paragraph>
      <Paragraph position="13"> The reason for the mapping from cosine scores to confidence scores is because the absolute degree of similarity between a request and a destination, as given by the cosine value between their vector representations, does not translate directly into the likelihood for correct routing. Instead, some destinations may require a higher cosine value, i.e., a closer degree of similarity, than others in order for a request to be correctly associated with those destinations. Thus we collected, for each destination, a set of cosine value/routing value pairs over all calls in the training data, where the routing value is 1 if the call should be routed to that destination and 0 otherwise. Then for each destination, we used the least squared error method in fitting a sigmoid function, 1/(1 + e-(a~+b)), to the set of cosine/routing pairs.</Paragraph>
      <Paragraph position="14"> We tested the routing performance using cosine vs.</Paragraph>
      <Paragraph position="15"> confidence values on 307 unseen unambiguous requests.</Paragraph>
      <Paragraph position="16"> In each case, we selected the destination with the highest cosine/confidence score to be the target destination. Using strict cosine scores, 92.2% of the calls are routed to the correct destination. On the other hand, using sigmoid confidence fitting, 93.5% of the calls are correctly routed. This yields a relative reduction in error rate of 16.7%.</Paragraph>
      <Paragraph position="17"> Decision Making The outcome of the routing module is a set of destinations whose confidence scores are above a pre-determined threshold. These candidate destinations represent those to which the caller's request can reasonably be routed. If there is only one such destination, then the call is routed and the caller notified; if there are two or more possible destinations, the disambiguation module is invoked in an attempt to formulate a query; otherwise, the the call is routed to an operator.</Paragraph>
      <Paragraph position="18"> To determine the optimal value for the threshold, we  ran a series of experiments to compute the upperbound and lowerbound of the router's performance varying the threshold from 0 to 0.9 at 0.1 intervals. The lowerbound represents the percentage of calls that are routed correctly, while the upperbound indicates the percentage of calls that have the potential to be routed correctly after disambiguation (see section 5 for details on upperbound and lowerbound measures). The results in Figure 2 show 0.2 to be the threshold that yields optimal performance.</Paragraph>
    </Section>
    <Section position="2" start_page="258" end_page="259" type="sub_section">
      <SectionTitle>
4.2 The Disambiguation Module
</SectionTitle>
      <Paragraph position="0"> The disambiguation module attempts to formulate an appropriate query to solicit further information from the caller in order to determine a unique destination to which the call should be routed. To generate an appropriate query, the caller's request and the candidate destinations must both be taken into account. We have developed a vector-based method for dynamically generating disambiguation queries by first selecting a set of terms and then forming a wh or yes-no question from these selected terms.</Paragraph>
      <Paragraph position="1"> The terms selected by the disambiguation mechanism are those terms related to the original request that can likely be used to disambiguate among the candidate destinations. These terms are chosen by filtering all terms based on the following three criteria: I. Closeness: We choose terms that are close (by the cosine measure) to the differences between the scaled pseudo-document query vector, D, and vectors representing the candidate destinations in VS.</Paragraph>
      <Paragraph position="2"> The intuition is that adding terms close to the differences will disambiguate the original query.</Paragraph>
      <Paragraph position="3">  2. Relevance: From the close terms, we construct a set of relevant terms which are terms that further specify a term in the original request. A close term is considered relevant if it can be combined with a term in the request to form a valid n-gram term, and the relevant term will be the resulting n-gram tenn.</Paragraph>
      <Paragraph position="4"> For instance, if &amp;quot;car,loan&amp;quot; is in the original request, then both &amp;quot;new&amp;quot; and &amp;quot;new, car&amp;quot; would produce the relevant term &amp;quot;new, car, loan&amp;quot; 3. Disambiguating power: Finally, we restrict attention to relevant terms that can be added to the  original request to result in an unambiguous routing using the routing mechanism described in Section 4.1.2. If none of the relevant terms satisfy this criterion, then we include all relevant terms in the set of disambiguating terms. Thus, instead of giving up the disambiguation process when no one term is predicted to resolve the ambiguity, the system may pose a question which further specifies the request and then select a disambiguating term based on this refined (although still ambiguous) request.</Paragraph>
      <Paragraph position="5"> The result of this filtering process is a finite set of terms which are relevant to the original ambiguous query and, when added to it, are likely to resolve the ambiguity. If a significant number of these terms share a head word, such as loan, the system asks the wh-question &amp;quot;for what type of loan ?'&amp;quot; Otherwise, the term that occurred most frequently in the training data is selected, based on the heuristic that a more common term is likely to be relevant than an obscure term, and a yes-no question is formed based on this term. A third alternative would be to ask a disjunctive question, but we have not yet explored this possibility. Figure 1 shows that after the system poses its query, it attempts to route the refined request, which is the original request augmented with the caller response to the system's query. In the case of whquestions, n-gram terms are extracted from the caller's response. In the case of yes-no questions, the system determines whether ayes or no answer is given. 3 In the former case, the disambiguating term used to form the query is considered the caller response, while in the latter case, the response is treated as in responses to wh-questions.</Paragraph>
      <Paragraph position="6"> Note that our disambiguation mechanism, like our basic routing technique, is fully domain-indepefident. It utilizes a set of n-gram terms, as well as term and document vectors that were obtained by the training of the call router. Thus, porting the call router to a new domain requires no change in the disambiguation module.</Paragraph>
    </Section>
    <Section position="3" start_page="259" end_page="259" type="sub_section">
      <SectionTitle>
4.3 Example
</SectionTitle>
      <Paragraph position="0"> To illustrate our call router, consider the request &amp;quot;loans please.&amp;quot; This request is ambiguous because our call center handles mortgage loans separately from all other types of loans, and for all other loans, existing loans and new loans are again handled by different departments.</Paragraph>
      <Paragraph position="1"> Given this request, the call router first extracts the relevant n-gram terms, which in this case results in the uni-gram &amp;quot;'loan&amp;quot;. It then computes a pseudo-document vector that represents this request, which is compared in turn with the 23 vectors representing all destinations in the call center. The cosine values between the request and each destination are then mapped into confidence values.</Paragraph>
      <Paragraph position="2"> 3 In our current system, a response is considered a yes response only if it explicitly contains the word yes. However, as discussed in (Green and Carberry, 1994; Hockey et al., 1997), responses to yes-no questions may not explicitly contain a yes or no term. We leave incorporating a more sophisticated response understanding model, such as (Green and Carberry, 1994), into our system for future work.</Paragraph>
      <Paragraph position="3"> Using a confidence threshold of 0.2, we have two candidate destinations, Loan Servicing and Consumer Lending; thus the disambiguation module is invoked.</Paragraph>
      <Paragraph position="4"> Our disambiguation module first selects from all n-gram terms those whose term vectors are close to the difference between the request vector and either of the two candidate destination vectors. This results in a list of 60 close terms, the vast majority of which are semantically close to &amp;quot;loan&amp;quot;, such as &amp;quot;auto, loan&amp;quot;, &amp;quot;payoff&amp;quot;, and &amp;quot;owe&amp;quot;. Next, the relevant terms are constructed from the set of close terms. This results in a list of 27 relevant terms, including &amp;quot;'auto, loan&amp;quot; and &amp;quot;loan,payoff&amp;quot;, but excluding owe, since neither &amp;quot;loan, owe&amp;quot; nor &amp;quot;owe, loan&amp;quot; constitutes a valid bigram. The third step is to select those relevant terms with disambiguation power, resulting in 18 disambiguating terms. Since 11 of these disambiguating terms share a head noun loan, a wh-question is generated based on this head word, resulting in the query &amp;quot;for what type of loan ?&amp;quot; Suppose in response to the system's query, the user answers &amp;quot;car loan&amp;quot;. The router then adds the new bi-gram &amp;quot;car, loan&amp;quot; to the original request and attempts to route the refined request. This refined request is again ambiguous between Loan Servicing and Consumer Lending since the caller did not specify whether it was an existing or new car loan. Again, the disambiguation module selects the close, relevant, and disambiguating terms, resulting in a unique term &amp;quot;exist, car, loan&amp;quot;. Thus, the system generates the yes-no question &amp;quot;is this about an existing car loan ? ,4 If the user responds &amp;quot;yes&amp;quot;, then the trigram term &amp;quot;exist, car, loan&amp;quot; is added to the refined request and the call routed to Loan Servicing; if the user says &amp;quot;'no, it's a new car loan&amp;quot;, then &amp;quot;new, car, loan&amp;quot; is extracted from the response and the call routed to Consumer Lending.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML