XML Viewer - w04-0307

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-0307_concl.xml
Size: 4,064 bytes
Last Modified: 2025-10-06 13:54:08
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0307">
  <Title>A Statistical Constraint Dependency Grammar (CDG) Parser</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
3 (Collins, 1999), Roark's (Roark, 2001), Ratna-
</SectionTitle>
    <Paragraph position="0"> parkhi's (Ratnaparkhi, 1999), and Xu &amp; Chelba's (Xu et al., 2002) parsers. Hence, we will compare our best loosely integrated and tightly integrated SCDG parsers to Charniak's parser. Additionally, we will compare with Collins' Model  tion and Model 3 since it handles wh-movement (Collins, 1999). Charniak's parser does not explicitly model these phenomena.</Paragraph>
    <Paragraph position="1"> Among the statistical CFG parsers to be compared, only Collins' Model 3 produces trees with information about wh-movement. Since the transformer uses empty node information to transform the CFG parse trees to CDG parses, the accuracy of Charniak's parser and Collins' Model 2 may be slightly reduced for sentences with empty nodes.</Paragraph>
    <Paragraph position="2"> Hence, we compare results on two test sets: one that omits all sentences with traces and one that does not. As can be seen in Table 4, our tightly coupled parser consistently produces an accuracy that equals or exceeds the accuracies of the other parsers, with one exception (Collins' Model 3), regardless of whether the test set contains sentences with traces.</Paragraph>
    <Paragraph position="3"> Using our evaluation metrics, Collins' Model 3 achieves a better precision/recall than Model 2 and Charniak's parser. Since trace information is used by the CFG-to-CDG transformer to generate certain lexical features (Wang, 2003), the output from Model 3 is likely to be mapped to more accurate CDG parses. Although Charniak's maximum-entropy inspired parser achieved the highest PARSEVAL results, Collins' Model 3 is more accurate using our dependency metric, possibly because it makes the complement/adjunct distinction and models wh-movement. Since the statistical  CFG parsers may loose accuracy from the CFG-to-CDG transformation, similarly to Collins' experiment reported in (Hajic et al., 1998), we also transformed our CDG parses to Penn Treebank style CFG parse trees and scored them using PARSE-VAL. On the WSJ PTB test set, Charniak's parser achieved 89.6% LR and 89.5% LP, Collins' Model 2 and 3 obtained 88.1% LR and 88.3% LP and 88.0% LR and 88.3% LP, while the tightly coupled CDG parser obtains 85.8% LR and 86.4% LP. It is important to remember that this score is impacted by two lossy conversions, one for training and one for testing.</Paragraph>
    <Paragraph position="4"> We have conducted a non-parametric Monte Carlo test to determine the significance of the differences between the parsing accuracy results in Table 3 and Table 4. We found that the difference between the tightly and loosely coupled SCDG parsers is statistically significant, as well as the difference between the SCDG parser and Charniak's parser and Collins' Model 2. Although the difference between our parser and Collins' Model 3 is not statistically significant, our parser represents a first attempt to build a high quality SCDG parser, and there is still room for improvement, e.g., better handling of barriers (including punctuation) and employing more sophisticated search and pruning strategies.</Paragraph>
    <Paragraph position="5"> This paper has presented a statistical implementation of a CDG parser, which is both generative and highly lexicalized. With a framework of tightly integrated, multiple knowledge sources, model distance, and synergistic dependencies, we have achieved a parsing accuracy comparable to the state-of-the-art statistical parsers trained on the Wall Street Journal Penn Treebank corpus. However, more work must be done to build a parser model capable of coping with speech disfluencies present in spontaneous speech. We also intend to investigate a hybrid parser that combines the generality of a CFG with the specificity of a CDG.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML