File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-0506_evalu.xml
Size: 5,991 bytes
Last Modified: 2025-10-06 13:59:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0506"> <Title>Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment</Title> <Section position="5" start_page="4" end_page="4" type="evalu"> <SectionTitle> 4 Evaluation of WEBCOOP </SectionTitle> <Paragraph position="0"> It is clear that an evaluation in the TREC style is not relevant for our approach. We have two forms of evaluations: (1) the evaluation of the portability of the system w.r.t. the forms of knowledge involved and the applicability of the inference schemas and (2) the evaluation of Indirect responses, for example: is your camping close to the highway?, can be indirectly, but cooperatively responded: yes, but that highway is quiet at night. the linguistic and cognitive adequacy of the responses produced by the system.</Paragraph> <Section position="1" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.1 Evaluating System Portability </SectionTitle> <Paragraph position="0"> Porting WEBCOOP to other large-public applications, given the complexity of the system, is quite challenging.</Paragraph> <Paragraph position="1"> 4.1.1 The lexicon and the Ontology First, we claim that the syntax of questions and the template-based approach used for producing responses are relatively stable. At the language level, the main task is to define an appropriate lexicon, in relation with the domain ontology.</Paragraph> <Paragraph position="2"> This task may be somewhat facilitated by the existence of shared resources, however these are quite rare for French. In general, we observe that some resources are common to all applications (e.g. communication or possession verbs), or prepositions, while others are totally specific, with dedicated senses and usages. Creating an application lexicon is costly, in particular when NL generation is involved. To give an evaluation of the complexity, an application like tourism requires about 150 verbs and about 1800 nouns.</Paragraph> <Paragraph position="3"> Among verbs, 100 are generic verbs, with standard senses. Describing verbs is complex, but their number is quite modest. Most nouns are not predicative, therefore, their lexicon can be partly deduced from the domain ontology.</Paragraph> <Paragraph position="4"> There are many domain ontologies on the web. Although constructed by domain experts, they turn out not to be necessarily adequate for providing responses to a large public of nonspecialists. The main difficulties are to customize these ontologies and to manage their coherence in order to produce a domain ontology which leads to coherent and adequate responses, as explained in section 3.2.</Paragraph> <Paragraph position="5"> In terms of cooperative functions, our experience is that most applications require the same types of functions, but with various degrees of importance. For example, some application will be subject to more cases of misunderstandings than others, depending, e.g. on the complexity of their associated knowledge and on the type of services expected by users. Similarly, the inference procedures used in WEBCOOP have been designed with a certain level of genericity. They should be portable provided that the knowledge resources of the new domain can be implemented using WEBCOOP format, which is quite generic. But, besides QA annotations, which is a very useful perspective, the adequacy of inferences can only be evaluated a posteriori. In a future stage, we plan to use what Barr and Klavans (Barr and Klavans, 2001) call component performance evaluation which consists of assessing the performance of system components and determining their impact on the over-all system performance.</Paragraph> </Section> <Section position="2" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.2 Evaluating Response intelligibility </SectionTitle> <Paragraph position="0"> Finally, since WEBCOOP produces responses in NL, some of which on a template basis (different from TREC which simply reproduces text extracts), it is important to evaluate the portability of those templates. We propose a method based on experimental psychology, that aims at evaluating the cooperative responses generated in the know-how component of WEBCOOP.</Paragraph> <Paragraph position="1"> Our methodology involves the following steps: - Evaluating templates within a single domain (tourism in our case). This goal includes two main parts : 1. intra-templates which aims at evaluating: null - response intelligibility in terms of (1) the adequacy of the response w.r.t the user intent, and of (2) the justifications and explanations mechanisms provided that led to the answer.</Paragraph> <Paragraph position="2"> -thereadability of the responses in terms of (3) the linguistic surface generation of both the underspecified terms and the different lexicalization choices made within each templates, and in terms of (4) the adequacy of our hyperlinks generation heuristics. null 2. inter-templates which aims at evaluating: null -thedisplay order relevance.Ifwego back to the example 1 in section 2.3, the responses are displayed following the inverse reading order of the question constraints i.e. chalet is the last concept to be relaxed in the question. This evaluation can also be useful for identifying other kinds of correlation between the answers display and the constraints order in the question.</Paragraph> <Paragraph position="3"> -thegeneral fluency in terms of syntaxical regularities of the responses generated by each template.</Paragraph> <Paragraph position="4"> -thevisual aspect of the responses : enumerations vs. paragraphs.</Paragraph> <Paragraph position="5"> - Evaluating templates portability to other large public domains like health and education. We have developed the experimental protocols associated to the relevance of explanation (point 2 cited above) and to the display order relevance. Interpretation results are ongoing.</Paragraph> </Section> </Section> class="xml-element"></Paper>