File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/w00-1016_evalu.xml
Size: 3,460 bytes
Last Modified: 2025-10-06 13:58:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1016"> <Title>Dialogue Helpsystem based on Flexible Matching of User Query with Natural Language Knowledge Base</Title> <Section position="8" start_page="147" end_page="148" type="evalu"> <SectionTitle> 7 Evaluation </SectionTitle> <Paragraph position="0"> The helpsystem started its service on July 1999 as part of the CIMS web service. All conversation logs between users and the system have been stored as a dialogue database.</Paragraph> <Paragraph position="1"> In the dialogue database, each dialogue between a user and the system is segmented into task units manually. We call this unit a session. Figure 4 shows the number of sessions and their evaluation of each week from July 5th to January 30th. On average, there are 70 sessions in a week; a dialogue with a user and the system consists of 2.1 sessions, which means a user asks 2.1 topics in one dialogue; one session consists of 3.2 turns.</Paragraph> <Paragraph position="2"> The evaluation of sessions is based on the following criteria.</Paragraph> <Paragraph position="3"> Success: The system could return a satisfactory answer.</Paragraph> <Paragraph position="4"> Failure:Input Analyzer: 'The system could not response properly because of the input analysis error, mostly the lack of utterance-pattern rules. Utterance~ pattern rules are added whenever the lack is found.</Paragraph> <Paragraph position="5"> Failure:Dialog Manager: The system could not response properly because of the dialogue manager error. Dialogue manager error comes both from simple bugs of the system and from unnoticed patterns of the user response.</Paragraph> <Paragraph position="6"> For example, when the system asks &quot;select from A and B&quot; expecting the answer &quot;A&quot; or &quot;B&quot;, a user might answer &quot;the latter&quot;. The system is modified whenever necessary.</Paragraph> <Paragraph position="7"> Failure:Knowledge: The system could not answer the question because of the lack of knowledge. This is the major reason of the failure as shown in Figure 4. Though the knowledge base is being extended step by step, the range of the user query is unlimited, including troubles in using PCs and advanced settings of software/hardware. null Failure:Difilcult: Current system architecture could not handle the question. For example, a user sometimes asks &quot;what is the difference between A and B&quot;, or when the system asks &quot;select from A and B&quot;, a user answers &quot;I don't know&quot;. In order to handle such utterances, we are planning to improve the system to exploit definitions of &quot;A&quot; and &quot;B&quot;. Out of scope: Out of the system domain, such as questions about telephone charges in using PPP or the Y2K problem. null Miscellaneous: Such as &quot;hello&quot;, &quot;this is a test&quot; or just a simple typo like &quot;a&quot;. The success ratio, that is, the ratio of Success over Success plus Failure, of the whole period is 37%. The system became stable around October 1999, and the success ratio after that (14 weeks) is 39%. Considering relatively wide domain the system have to cover, we feel the success ratio is reasonable, and the system is contributing to CIMS to some extent by handling simple FAQs like &quot;how to change my password&quot;.</Paragraph> </Section> class="xml-element"></Paper>