File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1614_metho.xml
Size: 19,816 bytes
Last Modified: 2025-10-06 14:10:47
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1614"> <Title>Is it Really that Difficult to Parse German?</Title> <Section position="5" start_page="112" end_page="114" type="metho"> <SectionTitle> 3 The Negra and the T&quot;uBa-D/Z Treebanks </SectionTitle> <Paragraph position="0"> Both treebanks use German newspapers as their data source: the Frankfurter Rundschau newspaper for Negra and the 'die tageszeitung' (taz) newspaper for T&quot;uBa-D/Z. Negra comprises 20 000 sentences, T&quot;uBa-D/Z 15 000 sentences. There is evidence that the complexity of sentences in both treebanks is comparable: sentence length as well as the percentage of clause nodes per sentence is comparable. In Negra, a sentence is 17.2 words long, in T&quot;uba-D/Z, 17.5 words. Negra has an average of 1.4 clause nodes per sentence, T&quot;uBa-D/Z 1.5 clause nodes.</Paragraph> <Paragraph position="1"> Both treebanks use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. Annotation for both was performed semiautomatically. Despite all these similarities, the treebank annotations differ in four important aspects: 1) Negra does not allow unary branching whereas T&quot;uBa-D/Z does; 2) in Negra, phrases receive a flat annotation whereas T&quot;uBa-D/Z uses phrase internal structure; 3) Negra uses crossing branches to represent long-distance relationships whereas T&quot;uBa-D/Z uses a pure tree structure combined with functional labels to encode this information; 4) Negra encodes grammatical functions in a combination of structural and functional labeling whereas T&quot;uBa-D/Z uses a combination of topological fields functional labels, which results in a flatter structure on the clausal level. The two treebanks also use different notions of grammatical functions: T&quot;uBa-D/Z defines 36 grammatical functions covering head and non-head information, as well as subcategorization for complements and modifiers. Negra utilizes 48 grammatical functions. Apart from commonly accepted grammatical functions, such as SB (subject) or OA (accusative object), Negra grammatical functions comprise a more extended notion, e.g. RE (repeated element) or RC (relative clause).</Paragraph> <Paragraph position="2"> anwenden.</Paragraph> <Paragraph position="3"> apply.</Paragraph> <Paragraph position="4"> 'The amateur painter can by all means apply this metaphor also to her life.' Figure 1 shows a typical tree from the Negra treebank for sentence (6). The syntactic categories are shown in circular nodes, the grammatical functions as edge labels in square boxes. A major phrasal category that serves to structure the sentence as a whole is the verb phrase (VP). It contains non-finite verbs (here: anwenden) together with their complements (here: the accusative object Diese Metapher) and adjuncts (here: the adverb durchaus and the PP modifier auch auf ihr Leben). The subject NP (here: die Freizeitmalerin) stands outside the VP and, depending on its linear position, leads to crossing branches with the VP. This happens in all cases where the subject follows the finite verb as in Figure 1. Notice also that the PP is completely flat and does not contain an internal NP.</Paragraph> <Paragraph position="5"> Another phenomenon that leads to the introduction of crossing branches in the Negra treebank are discontinuous constituents of the kind illustrated in section 2.3. Extraposed relative clauses, as in (4), are analyzed in such a way that the relative clause constituent is a sister of its head noun in the Negra tree and crosses the branch that dominates the intervening non-finite verb gelesen.</Paragraph> <Paragraph position="6"> The crossing branches in the Negra treebank cannot be processed by most probabilistic parsing models since such parsers all presuppose a strictly context-free tree structure. Therefore the Negra trees must be transformed into proper trees prior to training such parsers. The standard approach for this transformation is to re-attach crossing non-head constituents as sisters of the lowest mother node that dominates all constituents in question in the original Negra tree.</Paragraph> <Paragraph position="7"> Figure 2 shows the result of this transformation of the tree in Figure 1. Here, the fronted accusative object Diese Metapher is reattached on the clause level. Crossing branches do not only arise with respect to the subject at the sentence level but also in cases of extraposition and fronting of partial constituents. As a result, approximately 30% of all Negra trees contain at least one crossing branch.</Paragraph> <Paragraph position="8"> Thus, tree transformations have a major impact on the type of constituent structures that are used for training probabilistic parsing models. Previous work, such as Dubey (2005), Dubey and Keller (2003), and Schiehlen (2004), uses the version of Negra in which the standard approach to resolving crossing branches has been applied.</Paragraph> <Paragraph position="9"> logical structures (here: VF, MF and VC) into the tree. Notice also that compared to the Negra annotation, T&quot;uBa-D/Z introduces more internal structure into NPs and PPs.</Paragraph> <Paragraph position="10"> 'For this claim, Beckmeyer has not provided evidence yet.' In T&quot;uBa-D/Z, long-distance relationships are represented by a pure tree structure and specific functional labels. Figure 4 shows the T&quot;uBa-D/Z annotation for sentence (8). In this sentence, the prepositional phrase F&quot;ur diese Behauptung is fronted. Its functional label (OA-MOD) provides the information that it modifies the accusative object (OA) keinen Nachweis.</Paragraph> </Section> <Section position="6" start_page="114" end_page="114" type="metho"> <SectionTitle> 4 Experimental Setup </SectionTitle> <Paragraph position="0"> The main goals behind our experiments were twofold: (1) to re-investigate the claim that lexicalization is detrimental for treebank parsing of German, and (2) to compare the parsing results for the two German treebanks.</Paragraph> <Paragraph position="1"> To investigate the first issue, the Stanford Parser (Klein and Manning, 2003b), a state-of-the-art probabilistic parser, was trained with both lexicalized and unlexicalized versions of the two tree-banks (Experiment I). For lexicalized parsing, the Stanford Parser provides a factored probabilistic model that combines a PCFG model with a dependency model.</Paragraph> <Paragraph position="2"> For the comparison between the two treebanks, two types of experiments were performed: a purely constituent-based comparison using both the Stanford parser and the pure PCFG parser LoPar (Schmid, 2000) (Experiment II), and an in-depth evaluation of the three major grammatical functions subject, accusative object, and dative object, using the Stanford parser (Experiment III). All three experiments use gold POS tags extracted from the treebanks as parser input. All parsing results shown below are averaged over a ten-fold cross-validation of the test data. Experiments I and II used versions of the treebanks that excluded grammatical information, thus only contained constituent labeling. For Experiment III, all syntactic labels were extended by their grammatical function (e.g NX-ON for a subject NP in T&quot;uBa-D/Z or NP-SB for a Negra subject). Experiments I and II included all sentences of a maximal length of 40 words. Due to memory limitations (7 GB), Experiment III had to be restricted to sentences of a maximal length of 35 words.</Paragraph> </Section> <Section position="7" start_page="114" end_page="115" type="metho"> <SectionTitle> 5 Experiment I: Lexicalization </SectionTitle> <Paragraph position="0"> Experiment I investigates the effect of lexicalization on parser performance for the Stanford Parser.</Paragraph> <Paragraph position="1"> The results, summarized in Table 1, show that lexicalization improves parser performance for both the Negra and the T&quot;uBa-D/Z treebank in comparison to unlexicalized counterpart models: for labeled bracketing, an F-score improvement from 86.48 to 88.88 for T&quot;uBa-D/Z and an improvement from 66.92 to 67.13 for Negra. This directly contradicts the findings reported by Dubey and Keller (2003) that lexicalization has a negative effect on probabilistic parsing models for German. We therefore conclude that these previous claims, while valid for particular configurations of parsers and parameters, should not be generalized to claims about probabilistic parsing of German in general.</Paragraph> <Paragraph position="2"> Experiment I also shows considerable differences in the overall scores between the two treebanks, with the F-scores for T&quot;uBa-D/Z parsing approximating scores reported for English, but with Negra scores lagging behind by an average margin of appr. 20 points. Of course, it is important to note that such direct comparisons with English are hardly possible due to different annotation schemes, different underlying text corpora, etc. Nevertheless, the striking difference in parser performance between the two German treebanks warrants further attention. Experiments II and III will investigate this matter in more depth.</Paragraph> </Section> <Section position="8" start_page="115" end_page="115" type="metho"> <SectionTitle> 6 Experiment II: Different Parsers </SectionTitle> <Paragraph position="0"> The purpose of Experiment II is to rule out the possibility that the differences in parser performance for the two German treebanks produced by Experiment I may just be due to using a particular parser - in this particular case the hybrid PCFG and dependency model of the Stanford parser. After all, Experiment I also yielded different results concerning the received wisdom about the utility of lexicalization from previously reported results.</Paragraph> <Paragraph position="1"> In order to obtain a broader experimental base, unlexicalized models of the Stanford parser and the pure PCFG parser LoPar were trained on both treebanks. In addition we experimented with two different parameter settings of the Stanford parser, one with and one without markovization. The experiment with markovization used parent information (v=1) and a second order Markov model for horizontal markovization (h=2). The results, summarized in Table 2, show that parsing results for all unlexicalized experiments show roughly the same 20 point difference in F-score that were obtained for the lexicalized models in Experiment I. We can therefore conclude that the difference in parsing performance is robust across two parsers with different parameter settings, such as lexicalization and markovization.</Paragraph> <Paragraph position="2"> Experiment II also confirms the finding of Klein and Manning (2003a) and of Schiehlen (2004) that horizontal and vertical markovization has a positive effect on parser performance. Notice also that markovization with unlexicalized grammars yields almost the same improvement as lexicalization does in Experiment I.</Paragraph> </Section> <Section position="9" start_page="115" end_page="117" type="metho"> <SectionTitle> 7 Experiment III: Grammatical </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="115" end_page="117" type="sub_section"> <SectionTitle> Functions </SectionTitle> <Paragraph position="0"> In Experiments I and II, only constituent structure was evaluated, which is highly annotation dependent. It could simply be the case that the T&quot;uBa-D/Z annotation scheme contains many local structures that can be easily parsed by a PCFG model or the hybrid Stanford model. Moreover, such easy to parse structures may not be of great importance when it comes to determining the correct macrostructure of a sentence. To empirically verify such a conjecture, a separate evaluation of functions in Negra and T&quot;uBa-D/Z.</Paragraph> <Paragraph position="1"> parser performance for different constituent types would be necessary. However, even such an evaluation would only be meaningful if the annotation schemes agree on the defining characteristics of such constituent types. Unfortunately, this is not the case for the two treebanks under consideration. Even for arguably theory-neutral constituents such as NPs, the two treebanks differ considerably.</Paragraph> <Paragraph position="2"> In the Negra annotation scheme, single word NPs directly project from the POS level to the clausal level, while in T&quot;uBa-D/Z, they project by a unary rule first to an NP. An extreme case of this Negra annotation is shown in Figure 5 for sentence (9).</Paragraph> <Paragraph position="3"> Here, all the phrases are one word phrases and are thus projected directly to the clause level.</Paragraph> <Paragraph position="4"> There is an even more important motivation for not focusing on the standard constituent-based parseval measures - at least when parsing German. As discussed earlier in section 2.2, obtaining the correct constituent structure for a German sentence will often not be sufficient for determining its intended meaning. Due to the word order freeness of phrases, a given NP in any one position may in principle fulfill different grammatical functions in the sentence as a whole. Therefore grammatical functions need to be explicitly marked in the treebank and correctly assigned during parsing. Since both treebanks encode grammatical functions, this information is available for parsing and can ultimately lead to a more meaningful comparison of the two treebanks when used for parsing.</Paragraph> <Paragraph position="5"> The purpose of Experiment III is to investigate parser performance on the treebanks when grammatical functions are included in the trees. For these experiments, the unlexicalized, markovized PCFG version of the Stanford parser was used, with markovization parameters v=1 and h=2, as in Experiment II. The results of this experiment are shown in Table 3. The comparison of the experiments with (line 2) and without grammatical functions (line 1) confirms the findings of Dubey and Keller (2003) that the task of assigning correct grammatical functions is harder than mere constituent-based parsing. When evaluating on all grammatical functions, the results for Negra decrease from 69.95 to 51.41, and for T&quot;uBa-D/Z from 89.18 to 75.33. Notice however, that the relative differences between Negra and T&quot;uBa-D/Z that were true for Experiments I and II remain more or less constant for this experiment as well.</Paragraph> <Paragraph position="6"> In order to get a clearer picture of the quality of the parser output for each treebank, it is important to consider individual grammatical functions. As discussed in section 3, the overall inventory of grammatical functions is different for the two treebanks. We therefore evaluated those grammatical functions separately that are crucial for determining function-argument structure and that are at the same time the most comparable for the two treebanks. These are the functions of sub-ject (encoded as SB in Negra and as ON in T&quot;uBa-D/Z), accusative object (OA), and dative object (DA in Negra and OD in T&quot;uBa-D/Z). Once again, the results are consistently better for T&quot;uBa-D/Z (cf. lines 3-5 in Table 3), with subjects yielding the highest results (71.08 vs. 55.12 F-score) and dative objects the lowest results (14.07 vs. 5.00).</Paragraph> <Paragraph position="7"> The latter results must be attributed to data sparseness, dative object occur only appr. 1 000 times in each treebank while subjects occur more than</Paragraph> </Section> </Section> <Section position="10" start_page="117" end_page="117" type="metho"> <SectionTitle> 15 000 times. 8 Discussion </SectionTitle> <Paragraph position="0"> The experiments presented in sections 5-7 show that there is a difference in results of appr. 20% between Negra and T&quot;uBa-D/Z. This difference is consistent throughout, i.e. with different parsers, under lexicalization and markovization. These results lead to the conjecture that the reasons for these differences must be sought in the differences in the annotation schemes of the two treebanks.</Paragraph> <Paragraph position="1"> In section 3, we showed that one of the major differences in annotation is the treatment of discontinuous constituents. In Negra, such constituents are annotated via crossing branches, which have to be resolved before parsing. In such cases, constituents are extracted from their mother constituents and reattached at higher constituents.</Paragraph> <Paragraph position="2"> In the case of the discontinuous VP in Figure 1, it leads to a VP rule with the following daughters: head (HD) and modifier (MO), while the accusative object is directly attached at the sentence level as a sister of the VP. This conversion leads to inconsistencies in the training data since the annotation scheme requires that object NPs are daughters of the VP rather than of S. The inconsistency introduced by tree conversion are considerable since they cover appr. 30% of all Negra trees (cf. section 3). One possible explanation for the better performance of T&quot;uba-D/Z might be that it has more information about the correct attachment site of extraposed constituents, which is completely lacking in the context-free version of Negra. For this reason, K&quot;ubler (2005) and Maier (2006) tested a version of Negra which contained information of the original attachment site of these discontinuous constituents. In this version of Negra, the grammatical function OA in Figure 2 would be changed to OAa0VP to show that it was originally attached to the VP. Experiments with this version showed a decrease in F-score from 52.30 to 49.75. Consequently, adding this information in a similar way to the encoding of discontinuous constituents in T&quot;uba-D/Z harms performance.</Paragraph> <Paragraph position="3"> By contrast, T&quot;uBa-D/Z uses topological fields as the primary structuring principle, which leads to a purely context-free annotation of discontinuous structures. There is evidence that the use of topological fields is advantageous also for other parsing approaches (Frank et al., 2003; K&quot;ubler, 2005; Maier, 2006).</Paragraph> <Paragraph position="4"> Another difference in the annotation schemes concerns the treatment of phrases. Negra phrases are flat, and unary projections are not annotated.</Paragraph> <Paragraph position="5"> T&quot;uBa-D/Z always projects to the phrasal category and annotates more phrase-internal structure. The deeper structures in T&quot;uBa-D/Z lead to fewer rules for phrasal categories, which allows the parser a more consistent treatment of such phrases. For example, the direct attachment of one word subjects on the clausal level in Negra leads to a high number of different S rules with different POS tags for the subject phrase. An empirical proof for the assumption that flat phrase structures and the omission of unary nodes decrease parsing results is presented by K&quot;ubler (2005) and Maier (2006).</Paragraph> <Paragraph position="6"> We want to emphasize that our experiments concentrate on the original context-free annotations of the treebanks. We did not investigate the influence of treebank refinement in this study.</Paragraph> <Paragraph position="7"> However, we would like to note that by a combination of suffix analysis and smoothing, Dubey (2005) was able to obtain an F-score of 85.2 for Negra. For other work in the area of treebank refinement using the German treebanks see K&quot;ubler (2005), Maier (2006), and Ule (2003).</Paragraph> </Section> class="xml-element"></Paper>