File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-1049_concl.xml
Size: 1,216 bytes
Last Modified: 2025-10-06 13:55:09
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1049"> <Title>Will Pyramids Built of Nuggets Topple Over?</Title> <Section position="9" start_page="389" end_page="389" type="concl"> <SectionTitle> 8 Conclusion </SectionTitle> <Paragraph position="0"> The central importance that quantitative evaluation plays in advancing the state of the art in language technologies warrants close examination of evaluation methodologies themselves to ensure that they are measuring &quot;the right thing&quot;. In this work, we have identified a shortcoming in the present nugget-based paradigm for assessing answers to complex questions. The vital/okay distinction was designed to capture the intuition that some nuggets are more important than others, but as we have shown, this comes at a cost in stability and discriminative power of the metric. We proposed a revised model that incorporates judgments from multiple assessors in the form of a &quot;nugget pyramid&quot;, and demonstrated how this addresses many of the previous shortcomings. It is hoped that our work paves the way for more accurate and refined evaluations of question answering systems in the future.</Paragraph> </Section> class="xml-element"></Paper>