File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1419_concl.xml
Size: 2,171 bytes
Last Modified: 2025-10-06 13:55:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1419"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Evaluations of NLG Systems: common corpus and tasks or common dimensions and metrics?</Title> <Section position="7" start_page="128" end_page="128" type="concl"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> In this short position paper, we have argued that we need to enlarge our view of evaluation to encompass both usability evaluation (and include users beyond readers/listeners) and system-oriented evaluations. While we recognise that it is crucial to have ways to compare systems and approaches (the main advantage of having a common corpus and task), we suggest that we should look for ways to enable these comparisons without narrowing our view on evaluation and de-contextualising the systems under consideration. We have presented some possible dimensions on which approaches and systems could be evaluated. While we understand how to perform usability evaluations, we believe that an important question is whether it is possible to agree on dimensions for system-oriented evaluations and on &quot;metrics&quot; for these dimensions, to allow us to evaluate the different applications and approaches, and allow potential users of the technology to choose the appropriate one for their needs. In our own work, we exploit an NLG architecture to develop adaptive hypermedia applications (Paris et al., 2004), and some of our goals (Colineau et al., 2006) are to: * Articulate a comprehensive framework for the evaluation of approaches to building tailored information delivery systems and specific applications built using these approaches. null * Identify how an application or an approach measures along some dimensions We realise that, for some NLG applications, there might be no authors if all the data exploited by the system comes from underlying existing sources, e.g., weather or stock data or existing textual resources. (in particular for system-oriented evaluation). null We believe these are equally important for the evaluation of NLG systems.</Paragraph> </Section> class="xml-element"></Paper>