File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/h05-1100_relat.xml

Size: 2,237 bytes

Last Modified: 2025-10-06 14:15:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1100">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 795-802, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Morphology and Reranking for the Statistical Parsing of Spanish</Title>
  <Section position="3" start_page="795" end_page="795" type="relat">
    <SectionTitle>
2 Related Work
</SectionTitle>
    <Paragraph position="0"> The statistical parsing of English has surpassed 90% accuracy in the precision and recall of labeled constituents (e.g., (Collins, 1999; Charniak and Johnson, 2005)). A recent proliferation of treebanks in various languages has fueled research in the parsing of other languages. For instance, work has been done in Chinese using the Penn Chinese Tree-bank (Levy and Manning, 2003; Chiang and Bikel, 2002), in Czech using the Prague Dependency Tree-bank (Collins et al., 1999), in French using the French Treebank (Arun and Keller, 2005), in German using the Negra Treebank (Dubey, 2005; Dubey and Keller, 2003), and in Spanish using the UAM Spanish Treebank (Moreno et al., 2000). The best-reported F1 constituency scores from this work for each language are 79.9% (Chinese (Chiang and Bikel, 2002)), 81.0% (French (Arun and Keller, 2005), 76.2% (German (Dubey, 2005)), and 73.8% (Spanish (Moreno et al., 2000)). The authors in (Collins et al., 1999) describe an approach that gives 80% accuracy in recovering unlabeled dependencies in Czech.1 The project that is arguably most akin to the work presented in this paper is that on Spanish parsing (Moreno et al., 2000). However, a direct comparison of scores is complicated by the fact that we have used a different corpus as well as larger training and test sets (2,800- vs. 1,500-sentence training sets, and  cated: in addition to differences in corpus annotation schemes and sizes, there may be significant differences in linguistic characteristics. null  created our models. For brevity, we only list attributes with at least two values. See (Civit, 2000) for a comprehensive list of the morphological attributes included in the Spanish treebank.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML