File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1093_metho.xml

Size: 20,825 bytes

Last Modified: 2025-10-06 14:10:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1093">
  <Title>Automatic Generation of Domain Models for Call Centers from Noisy Transcriptions</Title>
  <Section position="5" start_page="738" end_page="738" type="metho">
    <SectionTitle>
3 Issues with Call Center Data
</SectionTitle>
    <Paragraph position="0"> We obtained telephonic conversation data collected from the internal IT help desk of a company. The calls correspond to users making specific queries regarding problems with computer software such as Lotus Notes, Net Client, MS Office, MS Windows, etc. Under these broad categories users faced specific problems e.g. in Lotus Notes users had problems with their passwords, mail archiving, replication, installation, etc. It is possible that many of the sub problem categories are similar, e.g. password issues can occur with Lotus Notes, Net Client and MS Windows.</Paragraph>
    <Paragraph position="1"> We obtained automatic transcriptions of the dialogs using an Automatic Speech Recognition (ASR) system. The transcription server, used for transcribing the call center data, is an IBM research prototype. The speech recognition system was trained on 300 hours of data comprising of help desk calls sampled at 6KHz. The transcription output comprises information about the recognized words along with their durations, i.e., beginning and ending times of the words. Further, speaker turns are marked, so the agent and customer portions of speech are demarcated without exactly naming which part is the agent and which the customer. It should be noted that the call center agents and the customers were of different nationalities having varied accents and this further made the job of the speech recognizer hard. The resultant transcriptions have a word error rate of about 40%. This high error rate implies that many wrong deletions of actual words and wrong insertion of dictionary words have taken place. Also often speaker turns are not correctly identified and voice portions of both speakers are assigned to a single speaker. Apart from speech recognition errors there are other issues related to spontaneous speech recognition in the transcriptions. There are no punctuation marks, silence periods are marked but it is not possible to find sentence boundaries based on these. There are repeats, false starts, a lot of pause filling words such as um and uh, etc.</Paragraph>
    <Paragraph position="2"> Portion of a transcribed call is shown in figure 1.</Paragraph>
    <Paragraph position="3"> Generally, at these noise levels such data is hard to interpret by a human. We used over 2000 calls that have been automatically transcribed for our analysis. The average duration of a call is about 9 SPEAKER 1: windows thanks for calling and you can learn yes i don't mind it so then i went to SPEAKER 2: well and ok bring the machine front end loaded with a standard um and that's um it's a desktop machine and i did that everything was working wonderfully um I went ahead connected into my my network um so i i changed my network settings to um to my home network so i i can you know it's showing me for my workroom um and then it is said it had to reboot in order for changes to take effect so i rebooted and now it's asking me for a password which i never i never said  minutes. For 125 of these calls, call topics were manually assigned.</Paragraph>
  </Section>
  <Section position="6" start_page="738" end_page="741" type="metho">
    <SectionTitle>
4 Generation of Domain Model
</SectionTitle>
    <Paragraph position="0"> Fig 2 shows the steps for generating a domain model in the call center scenario. This section explains different modules shown in the figure.</Paragraph>
    <Section position="1" start_page="738" end_page="738" type="sub_section">
      <SectionTitle>
4.1 Description of Model
</SectionTitle>
      <Paragraph position="0"> We propose the Domain Model to be comprised of primarily a topic taxonomy where every node is characterized by topic(s), typical Questions-Answers (Q&amp;As), typical actions and call statistics. Generating such a taxonomy manually from scratch requires significant effort. Further, the changing nature of customer problems requires frequent changes to the taxonomy. In the next subsection, we show that meaningful taxonomies can be built without any manual supervision from a collection of noisy call transcriptions.</Paragraph>
    </Section>
    <Section position="2" start_page="738" end_page="740" type="sub_section">
      <SectionTitle>
4.2 Taxonomy Generation
</SectionTitle>
      <Paragraph position="0"> As mentioned in section 3, automatically transcribed data is noisy and requires a good amount of feature engineering before applying any text analytics technique. Each transcription is passed through a Feature Engineering Component to perform noise removal. We performed a sequence of cleansing operations to remove stopwords such as the, of, seven, dot, january, hello. We also remove pause filling words such as um, uh, huh . The remaining words in every transcription are passed through a stemmer (using Porter's stemming algo- null rithm 1) to extract the root form of every word e.g. call from called. We extract all n-grams which occur more frequently than a threshold and do not contain any stopword. We observed that using all n-grams without thresholding deteriorates the quality of the generated taxonomy. a t &amp; t, lotus notes, and expense reimbursement are some examples of extracted n-grams.</Paragraph>
      <Paragraph position="1"> The Clusterer generates individual levels of the taxonomy by using text clustering. We used CLUTO package 2 for doing text clustering. We experimented with all the available clustering functions in CLUTO but no one clustering algorithm consistently outperformed others. Also, there was not much difference between various algorithms based on the available goodness metrics. Hence, we used the default repeated bisection technique with cosine function as the similarity metric. We ran this algorithm on a collection of 2000 transcriptions multiple times. First we generate 5 clusters from the 2000 transcriptions.</Paragraph>
      <Paragraph position="2"> Next we generate 10 clusters from the same set of transcriptions and so on. At the finest level we split them into 100 clusters. To generate the topic  taxonomy, these sets containing 5 to 100 clusters are passed through the Taxonomy Builder component. This component (1) removes clusters containing less than n documents (2) introduces directed edges from cluster v1 to v2 if v1 and v2 share at least one document between them, and where v2 is one level finer than v1. Now v1 and v2 become nodes in adjacent layers in the taxonomy.</Paragraph>
      <Paragraph position="3"> Here we found the taxonomy to be a tree but in general it can be a DAG. Now onwards, each node in the taxonomy will be referred to as a topic.</Paragraph>
      <Paragraph position="4"> This kind of top-down approach was preferred over a bottom-up approach because it not only gives the linkage between clusters of various granularity but also gives the most descriptive and discriminative set of features associated with each node. CLUTO defines descriptive (and discriminative) features as the set of features which contribute the most to the average similarity (dissimilarity) between documents belonging to the same cluster (different clusters). In general, there is a large overlap between descriptive and discriminative features. These features, topic features, are later used for generating topic specific information. Figure 3 shows a part of the taxonomy obtained from the IT help desk dataset. The labels  ontology along with descriptive features.</Paragraph>
      <Paragraph position="5"> shown in Figure 3 are the most descriptive and discriminative features of a node given the labels of its ancestors.</Paragraph>
    </Section>
    <Section position="3" start_page="740" end_page="741" type="sub_section">
      <SectionTitle>
4.3 Topic Specific Information
</SectionTitle>
      <Paragraph position="0"> The Model Builder component in Figure 2 creates an augmented taxonomy with topic specific information extracted from noisy transcriptions. Topic specific information includes phrases that describe typical actions, typical Q&amp;As and call statistics (for each topic in the taxonomy).</Paragraph>
      <Paragraph position="1"> Typical Actions: Actions correspond to typical issues raised by the customer, problems and strategies for solving them. We observed that action related phrases are mostly found around topic features. Hence, we start by searching and collecting all the phrases containing topic words from the documents belonging to the topic. We define a 10-word window around the topic features and harvest all phrases from the documents. The set of collected phrases are then searched for n-grams with support above a preset threshold. For example, both the 10-grams note in click button to set up for all stops and to action settings and click the button to set up increase the support count of the 5-gram click button to set up.</Paragraph>
      <Paragraph position="2"> The search for the n-grams proceeds based on a threshold on a distance function that counts the insertions necessary to match the two phrases. For example can you is closer to can &lt; ... &gt; you than to can &lt; ... &gt;&lt; ... &gt; you. Longer n-grams are allowed a higher distance threshold than shorter ngrams. After this stage we extracted all the phrases that frequently occur within the cluster.</Paragraph>
      <Paragraph position="3"> In the second step, phrase tiling and ordering, we prune and merge the extracted phrases and order them. Tiling constructs longer n-grams from sequences of overlapping shorter n-grams. We noted that the phrases have more meaning if they are ordered by their appearance. For example, if go to the program menu typically appears before select options from program menu then it is more thank you for calling this is problem with our serial number software Q: may i have your serial number Q: how may i help you today A: i'm having trouble with my at&amp;t network ............</Paragraph>
      <Paragraph position="4"> ............</Paragraph>
      <Paragraph position="5"> click on advance log in properties i want you to right click create a connection across an existing internet connection in d. n. s. use default network ............</Paragraph>
      <Paragraph position="6"> ............</Paragraph>
      <Paragraph position="7"> Q: would you like to have your ticket A: ticket number is two thank you for calling and have a great day thank you for calling bye bye anything else i can help you with have a great day you too  useful to present them in the order of their appearance. We establish this order based on the average turn number where a phrase occurs.</Paragraph>
      <Paragraph position="8"> Typical Questions-Answers: To understand a customer's issue the agent needs to ask the right set of questions. Asking the right questions is the key to effective call handling. We search for all the questions within a topic by defining question templates. The question templates basically look for all phrases beginning with how, what, can I, can you, were there, etc. This set comprised of 127 such templates for questions. All 10-word phrases conforming to the question templates are collected and phrase harvesting, tiling and ordering is done on them as described above. For the answers we search for phrases in the vicinity immediately following the question.</Paragraph>
      <Paragraph position="9"> Figure 4 shows a part of the topic specific information that has been generated for the default properti node in Fig 3. There are 123 documents in this node. We have selected phrases that occur at least 5 times in these 123 documents. We have captured the general opening and closing styles used by the agents in addition to typical actions and Q&amp;As for the topic. In this node the documents pertain to queries on setting up a new A T &amp; T network connection. Most of the topic specific issues that have been captured relate to the agent  leading the customer through the steps for setting up the connection. In the absence of tagged dataset we could not quantify our observation. However, when we compared the automatically generated topic specific information to the extracted information from the hand labeled calls, we noted that almost all the issues have been captured. In fact there are some issues in the automatically generated set that are missing from the hand labeled set. The following observations can be made from the topic specific information that has been generated: * The phrases that have been captured turn out to be quite well formed. Even though the ASR system introduces a lot of noise, the resulting phrases when collected over the clusters are clean.</Paragraph>
      <Paragraph position="10"> * Some phrases appear in multiple forms thank you for calling how can i help you, how may i help you today, thanks for calling can i be of help today. While tiling is able to merge matching phrases, semantically similar phrases are not merged.</Paragraph>
      <Paragraph position="11"> * The list of topic specific phrases, as already noted, matched and at times was more exhaustive than similar hand generated sets.</Paragraph>
      <Paragraph position="12"> Call Statistics: We compute various aggregate statistics for each node in the topic taxonomy as part of the model viz. (1) average call duration(in seconds), (2) average transcription length(number of words) (3) average number of speaker turns and (4) number of calls. We observed that call durations and number of speaker turns varies significantly from one topic to another. Figure 5 shows average call duration and corresponding average transcription lengths for a few interesting topics. It can be seen that in topic cluster-1, which is about expense reimbursement and related stuff, most of the queries can be answered quickly in standard ways. However, some connection related issues (topic cluster-5) require more information from customers and are generally longer in duration. Interestingly, topic cluster-2 and topic cluster-4 have similar average call durations but quite different average transcription lengths. On investigation we found that cluster-4 is primarily about printer related queries where the customer many a times is not ready with details like printer name, ip address of the printer, resulting in long hold time whereas for cluster-2, which is about online courses, users  some topic clusters generally have details like course name, etc. ready with them and are interactive in nature.</Paragraph>
      <Paragraph position="13"> We build a hierarchical index of type {topic-information} based on this automatically generated model for each topic in the topic taxonomy. An entry of this index contains topic specific information viz. (1) typical Q&amp;As, (2) typical actions, and (3) call statistics. As we go down this hierarchical index the information associated with each topic becomes more and more specific. In (Mishne et al., 2005) a manually developed collection of issues and their solutions is indexed so that they can be matched to the call topic. In our work the indexed collection is automatically obtained from the call transcriptions. Also, our index is more useful because of its hierarchical nature where information can be obtained for topics of various granularity unlike (Mishne et al., 2005) where there is no concept of topics at all.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="741" end_page="743" type="metho">
    <SectionTitle>
5 Application of Domain Model
</SectionTitle>
    <Paragraph position="0"> Information retrieval from spoken dialog data is an important requirement for call centers. Call centers constantly endeavor to improve the call handling efficiency and identify key problem areas.</Paragraph>
    <Paragraph position="1"> The described model provides a comprehensive and structured view of the domain that can be used to do both. It encodes three levels of information about the domain: * General: The taxonomy along with the labels gives a general view of the domain. The general information can be used to monitor trends on how the number of calls in different categories change over time e.g. daily, weekly, monthly.</Paragraph>
    <Paragraph position="2">  * Topic level: This includes a listing of the specific issues related to the topic, typical customer questions and problems, usual strategies for solving the problems, average call durations, etc. It can be used to identify primary issues, problems and solutions pertaining to any category.</Paragraph>
    <Paragraph position="3"> * Dialog level: This includes information on how agents typically open and close calls, ask questions and guide customers, average number of speaker turns, etc. The dialog level information can be used to monitor whether agents are using courteous language in their calls, whether they ask pertinent questions, etc.</Paragraph>
    <Paragraph position="4"> The {topic-information} index requires identification of the topic for each call to make use of information available in the model. Below we show examples of the use of the model for topic identification.</Paragraph>
    <Section position="1" start_page="742" end_page="742" type="sub_section">
      <SectionTitle>
5.1 Topic Identification
</SectionTitle>
      <Paragraph position="0"> Many of the customer complaints can be categorized into coarse as well as fine topic categories by listening to only the initial part of the call. Exploiting this observation we do fast topic identification using a simple technique based on distribution of topic specific descriptive and discriminative features (Sec 4.2) within the initial portion of the call. Figure 6 shows variation in prediction accuracy using this technique as a function of the fraction of a call observed for 5, 10 and 25 clusters verified over the 125 hand-labeled transcriptions. It can be seen, at coarse level, nearly 70% prediction accuracy can be achieved by listening to the initial 30% of the call and more than 80% of the calls can be correctly categorized by listening only to the first half of the call. Also calls related to some categories can be quickly detected compared to some other clusters as shown in Figure 7.</Paragraph>
    </Section>
    <Section position="2" start_page="742" end_page="743" type="sub_section">
      <SectionTitle>
5.2 Aiding and Administrative Tool
</SectionTitle>
      <Paragraph position="0"> Using the techniques presented in this paper so far it is possible to put together many applications for a call center. In this section we give some example applications and describe ways in which they can be implemented. Based on the hierarchical model described in Section 4 and topic identification mentioned in the last sub-section we describe  fraction of call observed for 5, 10 and 25 clusters  curacy for 10 clusters (1) a tool capable of aiding agents for efficient handling of calls to improve customer satisfaction as well as to reduce call handling time, (2) an administrative tool for agent appraisal and training. Agent aiding is done based on the automatically generated domain model. The hierarchical nature of the model helps to provide generic to specific information to the agent as the call progresses. During call handling the agent can be provided the automatically generated taxonomy and the agent can get relevant information associated with different nodes by say clicking on the nodes. For example, once the agent identifies a call to be about {lotusnot} in Fig 3 then he can see the generic Lotus Notes related Q&amp;As and actions. By interacting further with the customer the agent identifies it to be of {copi archiv replic} topic and typical Q&amp;As and actions change accordingly. Finally, the agent narrows down to the topic as {servercopi localcopi} and suggest solution for replication problem in Lotus Notes.</Paragraph>
      <Paragraph position="1"> The concept of administrative tool is primarily driven by Dialog and Topic level information. We envision this post-processing tool to be used  for comparing completed individual calls with corresponding topics based on the distribution of Q&amp;As, actions and call statistics. Based on the topic level information we can check whether the agent identified the issues and offered the known solutions on a given topic. We can use the dialog level information to check whether the agent used courteous opening and closing sentences. Calls that deviate from the topic specific distributions, can be identified in this way and agents handling these calls can be offered further training on the subject matter, courtesy, etc. This kind of post-processing tool can also help us to catch abnormally long calls, agents with high average call handle time, etc.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML