File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1419_intro.xml
Size: 1,726 bytes
Last Modified: 2025-10-06 14:04:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1419"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Evaluations of NLG Systems: common corpus and tasks or common dimensions and metrics?</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> For this special session, a specific question was asked: what would a shared task and shared corpus be that would enable us to perform comparative evaluations of alternative techniques in natural language generation (NLG)? In this position paper, we question the appropriateness of this specific question and suggest that the community might be better served by (1) looking at a different question: what are the dimensions and metrics that would allow us to compare various techniques and systems and (2) not forgetting but encouraging usability evaluations of specific applications. null The purpose of defining a shared task and a shared corpus is to compare the performance of various systems. It is thus a system-oriented view of evaluation, as opposed to an end-user oriented (or usability) view of evaluation. It is, however, potentially a narrow view of a system-oriented evaluation, as it looks at the performance of an NLG system within a very specific context - thus essentially looking at the performance of a specific application. We argue here that (1), even if we take a system-oriented view of evaluation, the evaluation of NLG systems should not be limited to their performance in a specific context but should take other system's characteristics into account, and that (2) end-user evaluations are crucial.</Paragraph> </Section> class="xml-element"></Paper>