File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1042_intro.xml
Size: 1,941 bytes
Last Modified: 2025-10-06 14:02:24
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1042"> <Title>Deep dependencies from context-free statistical parsers: correcting the surface dependency approximation</Title> <Section position="4" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Datasets </SectionTitle> <Paragraph position="0"> The datasets used for this study consist of the Wall Street Journal section of the Penn Treebank of English (WSJ) and the context-free version of the NEGRA (version 2) corpus of German (Skut et al., 1997b). Full-size experiments on WSJ described in Section 4 used the standard sections 2-21 for training, 24 for development, and trees whose yield is under 100 words from section 23 for testing. Experiments described in Section 4.3 used the same development and test sets but files 200-959 of WSJ as a smaller training set; for NEGRA we followed Dubey and Keller (2003) in using the first 18,602 sentences for training, the last 1,000 for development, and the previous 1,000 for testing. Consistent with prior work and with common practice in statistical parsing, we stripped categories of all functional tags prior to training and testing (though in several cases this seems to have been a limiting move; see Section 5).</Paragraph> <Paragraph position="1"> Nonlocal dependency annotation in Penn Tree-banks can be divided into three major types: unindexed empty elements, dislocations, and control.</Paragraph> <Paragraph position="2"> The first type consists primarily of null complementizers, as exemplified in Figure 1 by the null relative pronoun 0 (c.f. aspects that it sees), and do not participate in (though they may mediate) nonlocal dependency. The second type consists of a dislocated element coindexed with an origin site of semantic interpretation, as in the association in Figure 1 of WHNP-1 with the direct object position of sees (a relativization), and the association of S-</Paragraph> </Section> class="xml-element"></Paper>