File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-1095_intro.xml

Size: 1,805 bytes

Last Modified: 2025-10-06 14:06:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1095">
  <Title>Towards a More Careful Evaluation of Broad Coverage Parsing Systems</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> During the last few years large treebanks have become available to many researchers, which has resulted in researches applying a range of new techniques for parsing systems. Most of the methods that are being suggested include some kind of Machine Learning, such as history based grammars and decision tree models (Black et al., 1993; Magerman, 1995), training or inducing statistical grammars (Black, Garside and Leech, 1993; Pereira and Schabes, 1992; Schabes et al., 1993), or other techniques (Bod, 1993).</Paragraph>
    <Paragraph position="1"> Consequently, syntactical analysis has become an area with a wide variety of (a) algorithms and methods for learning and parsing, and (b) type of information used for learning and parsing (sometimes referred to as feature set). These methods only could become popular through evaluation methods for parsing systems, such as Bracket Accuracy, Bracket Recall, Sentence Accuracy and Viterbi Score. Some of them were introduced in (Black et al., 1991; Harrison et M., 1991).</Paragraph>
    <Paragraph position="2"> These evaluation metrics have a number of problems, and in this paper we argue that they need to be reconsidered, and give a number of suggestions either to overcome those problems or to gain a better understanding of those problems. Particular problems we look at are arbitrary choices in the treebank, errors in the treebank, types of errors made by parsers, and the statistical significance of differences in test scores by parsers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML