File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1809_evalu.xml

Size: 2,248 bytes

Last Modified: 2025-10-06 13:59:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1809">
  <Title>A Statistical Approach to the Semantics of Verb-Particles</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6 Results
</SectionTitle>
    <Paragraph position="0"> The results in Table 4 show that on all tasks (for the majority-view based data and three out of four for the centroid data), at least one of the four statistical methods offers an improvement in precision over the baseline, and that there is an improvement in F-score for TASK 1 on both sets of data. There are swings in the relative scores obtained over the majority as compared to centroid annotator data for a given task. In terms of relative performance, the semantic similarity based approach of Methods 3 and 4 outperform the distribution based approach of Methods 1 and 2 in terms of F-score, on 6 of the 8 sets of results reported.</Paragraph>
    <Paragraph position="1"> In order to get a reliable sense for how good these scores are, we compare them with the level of agreement across human judges. We calculated pairwise agreement across all participants on the four classification tasks, resulting in the figures given in Table 4. These agreement scores give us an upper bound for classification accuracy on each task, from which it is possible to benchmark the classification accuracy of the classifiers on that same task. On TASK 1, three of the four classifiers achieved a classification accuracy of .575. On TASK 2, the highest-performing classifier (Method 4), achieved a classification accuracy of .725. On TASK 3, Method 2 achieved the highest classification accuracy at .600, and on TASK 4, Method 4 achieved a classification accuracy of .675. We can see then that the best classifiers perform only marginally below the upper bound on at least two of the tasks.</Paragraph>
    <Paragraph position="2"> While these results may appear at first glance to be less than conclusive, we must bear in mind that we are working with limited amounts of data and relatively simplistic models of a cognitively intensive task. We interpret them as very positive indicators of the viability of using empirical methods to analyse VPC semantics. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML