File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1137_intro.xml
Size: 1,575 bytes
Last Modified: 2025-10-06 14:01:26
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1137"> <Title>A New Probabilistic Model for Title Generation</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Evaluation </SectionTitle> <Paragraph position="0"> In this experiment, we introduce two different of evaluations, i.e. a F1 metric for automatic evaluation and human judgments to evaluate the quality of machine-generated titles.</Paragraph> <Paragraph position="1"> F1 metric is a common evaluation metric that has been widely used in information retrieval and automatic text summarization. Witbrock and Mittal (1999) used the F1 measurement (Rjiesbergen, 1979) as their performance metric. For an automatically generated title T auto , F1 is measured against the correspondent human assigned title Here, precision and recall is measured as the number of identical words shared by title T auto and T human over the number of words in title T auto and the number of words in title T human respectively.</Paragraph> <Paragraph position="2"> Unfortunately, this metric ignores syntax and human readability. In this paper, we also asked people to judge the quality of machine-generated titles. There are five different quality categories, namely 'very good', 'good', 'ok', 'bad', 'extremely bad'. A simple score scheme is developed with score 5 for the category 'very good', score 4 for 'good', score 3 for 'ok', score 2 for 'bad' and score 1 for 'extremely bad'. The average score of human judgment is used as another evaluation metric.</Paragraph> </Section> class="xml-element"></Paper>