File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-1015_evalu.xml

Size: 4,032 bytes

Last Modified: 2025-10-06 13:58:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1015">
  <Title>Handling noisy training and testing data</Title>
  <Section position="6" start_page="8" end_page="9" type="evalu">
    <SectionTitle>
5 Experimental results
</SectionTitle>
    <Paragraph position="0"> As a sort of case study in the meta-algorithms presented in the previous sections, we will look at the problem of function tagging in the treebank.</Paragraph>
    <Paragraph position="1"> Blaheta and Charniak (2000) describe an algorithm for marking sentence constituents with function tags such as SBJ (for sentence subjects) and TMP (for temporal phrases). We trained this algorithm on sections 02{21 of the treebank and ran it on section 24 (the development corpus), then analysed the output.</Paragraph>
    <Paragraph position="2"> First, we printed out every constituent with a function tag error. We then examined the sentence in which each occurred, and determined whether the error was in the algorithm or in the treebank, or elsewhere, as reported in Table 1. Of the errors we examined, less than half were due solely to an algorithmic failure in the function tagger itself. The next largest category was parse error: this function tagging algorithm requires parsed input, and in these cases, that input was incorrect and led the function tagger astray; had the tagger received the treebank parse, it would have given correct output. In just under a fth of the reported \errors&amp;quot;, the algorithm was correct and the treebank was de nitely wrong.</Paragraph>
    <Paragraph position="3"> The remainder of cases we have identi ed either as Type C errors|wherein the tagger agreed with many training examples, but the \correct&amp;quot; tag agreed with many others|or at least \dubious&amp;quot;, in the cases that weren't common enough to be systematic inconsistencies but where the guidelines did not clearly prefer the treebank tag over the tagger output, or vice versa.</Paragraph>
    <Paragraph position="4"> Next, we compiled all the noted treebank errors and their corrections. The most common correction involved simply adding, removing, or changing a function tag to what the algorithm output (with a net e ect of improving our score). However, it should be noted that when classifying reported errors, we examined their contexts, and in so doing discovered other sorts of treebank error. Mistags and misparses did not directly a ect us; some function tag corrections actually decreased our score. All corrections were applied anyway, in the hope of cleaner evaluations for future researchers. In total, we made 235 corrections, including about 130 simple retags.</Paragraph>
    <Paragraph position="5"> Grammatical tags Form/function tags Topicalisation tags  Finally, we re-evaluated the algorithm's output on the corrected development corpus. Table 2 shows the resulting improvements.</Paragraph>
    <Paragraph position="6">  Precision, recall, and F-measure are calculated as in (Blaheta and Charniak, 2000). The false error rate is simply the percent by which the error is reduced; in terms of the performance on the treebank version (t)andthe xed  This is the percentage of the reported errors that are due to treebank error.</Paragraph>
    <Paragraph position="7"> The topicalisation result is nice, but since the TPC tag is fairly rare (121 occurrences in section 24), these numbers may not be robust. It is interesting, though, that the false error rate on the two major tag groups is so similar|roughly 20% in precision and 5% in recall for each, leading to 10% in F-measure. First of all, this parallelism strengthens our assertion that the false error rate, though calculated on a development corpus, can be presumed to apply equally to the test corpus, since it indicates that the human missed tag and mistag rates may be roughly constant. Second, the much higher improvement on precision indicates that the majority of treebank error (at least in the realm of function tagging) is due to human annotators forgetting a tag.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML