File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/j03-4003_evalu.xml

Size: 2,775 bytes

Last Modified: 2025-10-06 13:58:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="J03-4003">
  <Title>c(c) 2003 Association for Computational Linguistics Head-Driven Statistical Models for Natural Language Parsing</Title>
  <Section position="8" start_page="607" end_page="608" type="evalu">
    <SectionTitle>
6. Results
</SectionTitle>
    <Paragraph position="0"> The parser was trained on sections 2-21 of the Wall Street Journal portion of the Penn Treebank (Marcus, Santorini, and Marcinkiewicz 1993) (approximately 40,000 sentences) and tested on section 23 (2,416 sentences). We use the PARSEVAL measures (Black et al. 1991) to compare performance: Labeled precision = number of correct constituents in proposed parse number of constituents in proposed parse Labeled recall = number of correct constituents in proposed parse number of constituents in treebank parse Crossing brackets = number of constituents that violate constituent boundaries with a constituent in the treebank parse For a constituent to be &amp;quot;correct,&amp;quot; it must span the same set of words (ignoring punctuation, i.e., all tokens tagged as commas, colons, or quotation marks) and have the same label  as a constituent in the treebank parse. Table 2 shows the results for models 1, 2 and 3 and a variety of other models in the literature. Two models (Collins 2000; Charniak 2000) outperform models 2 and 3 on section 23 of the treebank. Collins (2000) uses a technique based on boosting algorithms for machine learning that reranks n-best output from model 2 in this article. Charniak (2000) describes a series of enhancements to the earlier model of Charniak (1997).</Paragraph>
    <Paragraph position="1"> The precision and recall of the traces found by Model 3 were 93.8% and 90.1%, respectively (out of 437 cases in section 23 of the treebank), where three criteria must be met for a trace to be &amp;quot;correct&amp;quot;: (1) It must be an argument to the correct headword; (2) It must be in the correct position in relation to that headword (preceding or following); 15 Magerman (1995) collapses ADVP and PRT into the same label; for comparison, we also removed this distinction when calculating scores.</Paragraph>
    <Paragraph position="2">  Results on Section 23 of the WSJ Treebank. LR/LP = labeled recall/precision. CBs is the average number of crossing brackets per sentence. 0 CBs, [?] 2 CBs are the percentage of sentences with 0 or [?] 2 crossing brackets respectively. All the results in this table are for models trained and tested on the same data, using the same evaluation metric. (Note that these results show a slight improvement over those in (Collins 97); the main model changes were the improved treatment of punctuation (section 4.3) together with the addition of the P</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML