File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-2029_metho.xml

Size: 21,416 bytes

Last Modified: 2025-10-06 14:10:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2029">
  <Title>The Benefit of Stochastic PP Attachment to a Rule-Based Parser</Title>
  <Section position="5" start_page="223" end_page="225" type="metho">
    <SectionTitle>
3 Methods
</SectionTitle>
    <Paragraph position="0"> Statistical PP attachment is based on the observation that the identities of content words can be used to predict which prepositional phrases modify which words, and achieve better-than-chance accuracy. This is apparently because, as heads of their respective phrases, they are representative enough that they can serve as a crude approximation of the semantic structure that could be derived from the phrases. Consider the following example (the last sentence in our test set): Die Firmen m&amp;quot;ussen noch die Bedenken der EU-Kommission gegen die Fusion ausr&amp;quot;aumen. (The companies have yet to address the Commission's concerns about the merger.) In this sentence, the preferred analysis will pair the preposition 'gegen' (against, about, versus) with the noun 'Bedenken' (concerns), since the proposition is clearly that the concerns pertain to the merger. A syntax tree of this interpretation is shown in Figure 1. Note that there are at least three different syntactically plausible attachment sites for the preposition. In fact, there are even more, since a parser can make no initial assumptions about the global structure of the syntax tree that it will construct; for instance, the possibility that 'gegen' attaches to the noun 'Firmen' (companies) cannot be ruled out when beginning to parse.</Paragraph>
    <Section position="1" start_page="223" end_page="223" type="sub_section">
      <SectionTitle>
3.1 WCDG
</SectionTitle>
      <Paragraph position="0"> For the following experiments, we used the dependency parser of German described in (Foth et al., 2005). This system is especially suited to our goals for several reasons. Firstly, the parser achieves the highest published dependency-based accuracy on unrestricted written German input, but still has a comparatively high error rate for prepositions. In particular, it mis-attaches the preposition 'gegen' in the example sentence. Second, although rule-based in nature, it uses numerical penalties to arbitrate between different disambiguation rules. It is therefore easy to add another rule of varying strength, which depends on the output of an external statistical predictor, to guide the parser when it has no other means of making an attachment decision. Finally, the parser and grammar are freely available for use and modification (http://nats-www.informatik.</Paragraph>
      <Paragraph position="1"> uni-hamburg.de/download).</Paragraph>
    </Section>
    <Section position="2" start_page="223" end_page="224" type="sub_section">
      <SectionTitle>
Weighted Constraint Dependency Grammar
</SectionTitle>
      <Paragraph position="0"> (Schr&amp;quot;oder, 2002) models syntax structure as labelled dependency trees as shown in the example. A grammar in this formalism is written as a set of constraints that license well-formed partial syntax structures. For instance, general projectivity rules ensure that the dependency tree corresponds to a properly nested syntax structure without crossing brackets1. Other constraints require an auxiliary verb to be modified by a full verb, or prescribe morphosyntactical agreement between a determiner and its regent (the word modified by the determiner). Although the Constraint Satisfaction Problem that this formalism defines is, in theory, infeasibly hard, it can nevertheless be solved approximatively with heuristic solution methods, and achieve competitive parsing accuracy.</Paragraph>
      <Paragraph position="1"> To allow the resolution of true ambiguity (the existence of different structures neither of which is strictly ungrammatical), weighted constraints can be written that the solution should satisfy, if this is possible. The goal is then to build the structure that violates as few constraints as possible, and preferentially violates weak rather than strong constraints. This allows preferences to be expressed rather than hard rules. For instance, agreement constraints could actually be declared as violable, since typing errors, reformulations, etc. can  and do actually lead to mis-inflected phrases. In this way robustness against many types of error can be achieved while still preferring the correct variant. For more about the WCDG parser, see (Schr&amp;quot;oder, 2002; Foth and Menzel, 2006) .</Paragraph>
      <Paragraph position="2"> The grammar of German available for this parser relies heavily on weighted constraints both to cope with many kinds of imperfect input and to resolve true ambiguities. For the example sentence, it retrieves the desired dependencies except for constructing the implausible dependency 'ausr&amp;quot;aumen'+'gegen' (address against). Let us briefly review the relevant constraints that cause this error: * General structural, valence and agreement constraints determine the macro structure of the sentence in the desired way. For instance, the finite and the full verb must combine to form an auxiliary phrase, because this is the only way of accounting for all words while satisfying valence and category constraints. For the same reasons both determiners must be paired with their respective nouns. Also, the prepositional phrase itself is correctly predicted.</Paragraph>
      <Paragraph position="3"> * General category constraints ensure that the preposition can attach to nouns and verbs, but not, say, to a determiner or to punctuation.</Paragraph>
      <Paragraph position="4"> * A weak constraint on adjuncts says that adjuncts are usually close to their regent. The penalty of this constraint varies according to the length of the dependency that it is applied to, so that shorter dependencies are generally preferred.</Paragraph>
      <Paragraph position="5"> * A slightly stronger constraint prefers attachment of the preposition to the verb, since overall verb attachment is more common than noun attachment in German. Therefore, the verb attachment leads to the globally best solution for this sentence.</Paragraph>
      <Paragraph position="6"> There are no lexicalized rules that capture the particular plausibility of the phrase 'Bedenken gegen' (concerns about). A constraint that describes this individual word pair would be trivial to write, but it is not feasible to model the general phenomenon in this way; thousands of constraints would be needed just to reflect the more important collocations in a language, and the exact set of collocating words is impossible to predict accurately. Data-driven information would be much more suitable for curing this lexical blind spot.</Paragraph>
    </Section>
    <Section position="3" start_page="224" end_page="225" type="sub_section">
      <SectionTitle>
3.2 The Collocation Measure
</SectionTitle>
      <Paragraph position="0"> The usual way to retrieve the lexical preference of a word such as 'Bedenken' for 'gegen' is to obtain a large corpus and assume that it is representative of the entire language; in particular, that collocations in this corpus are representative of collocations that will be encountered in future input. The assumption is of course not entirely true, but it can nevertheless be preferable to rely on such uncertain knowledge rather than remain undecided, on the reasonable assumption that it will lead to more correct than wrong decisions. Note that the same reasoning applies to many of the violable constraints in a WCDG: although they do not hold on all possible structures, they hold more often than they fail, and therefore can be useful for analysing unknown input.</Paragraph>
      <Paragraph position="1"> Different measures have been used to gauge the strength of a lexical preference, but in general the efficacy of the statistical approach depends more on the suitability of the training corpus than on details of the collocation measure. Since our focus  is not on finding the best extraction method, but on judging the benefit of statistical components to parsing, we employ a collocation measure related to the idea of mutual information: a collocation between a word w and a preposition p is judged more likely the more often it appears, and the less often its component words appear. By normalizing against the total number t of utterances we derive a measure of Lexical Attraction for each possible collocation:</Paragraph>
      <Paragraph position="3"> For instance, if we assume that the word 'Bedenken' occurs in one out of 2,000 sentences of German and the word 'gegen' occurs in one sentence out of 31 (these figures were taken from the unsupervised experiment described later), then pure chance would make the two words co-occur in one sentence out of 62,000. If the LA score is higher than 1, i. e. we observe a much higher frequency of co-occurrences in a large corpus, we can assume that the two events are not statistically independent -- in other words, that there is a positive correlation between the two words. Conversely, we would expect a much lower score for the implausible collocation 'Bedenken'+'f&amp;quot;ur', indicating a dispreference for this attachment.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="225" end_page="227" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="225" end_page="226" type="sub_section">
      <SectionTitle>
4.1 Sources
</SectionTitle>
      <Paragraph position="0"> To obtain the counts to base our estimates of attraction on, we first turned to the dependency tree-bank that accompanies the WCDG parsing suite.</Paragraph>
      <Paragraph position="1"> This corpus contains some 59,000 sentences with 1,000,000 words with complete syntactic annotations, 61% of which are drawn from online technical newscasts, 33% from literature and 6% from law texts. We used the entire corpus except for the test set as a source for counting PP attachments directly. All verbs, nouns and prepositions were first reduced to their base forms in order to reduce the parameter space. Compound nouns were reduced to their base nouns, so that 'EU-Kommission' is treated the same as 'Kommission', on the assumption that the compound exerts similar attractions as the base noun. In contrast, German verbs with prefixes usually differ markedly in their preferences from the base verb. Since forms of verbs such as 'ausr&amp;quot;aumen' (address) can be split into two parts  ('NP r&amp;quot;aumte NP aus'), such separated verbs were reassembled before stemming.</Paragraph>
      <Paragraph position="2"> Although the information retrieved from complete syntax trees is valuable, it is clearly insufficient for estimating many valid collocations. In particular, even for a comparatively strong collocation such as 'Bedenken'+'gegen' we can expect only very few instances. (There are, in fact, 4 such instances, well above chance level but still a very small number.) Therefore we used the archived text from 18 volumes of the newspaper tageszeitung as a second source. This corpus contains about 295,000,000 words and should allow us to detect many more collocations. In fact, we do find 2338 instances of 'Bedenken'+'gegen' in the same sentence.</Paragraph>
      <Paragraph position="3"> Of course, since we have no syntactic annotations for this corpus (and it would be infeasible to create them even by fully automatic parsing), not all of these instances may indicate a syntactic dependency. (Ratnaparkhi, 1998) solved this problem by regarding only prepositions in syntactically unambiguous configurations. Unfortunately, his patterns cannot directly be applied to German sentences because of their freer word order. As an approximation it would be possible to count only pairs of adjacent content words and prepositions.</Paragraph>
      <Paragraph position="4"> However, this would introduce systematic biases into the counts, because nouns do in fact very often occur adjacently to prepositions that modify them, but many verbs do not. For instance, the phrase 'jmd. anklagen wegen etw.' (to sue s.o. for s.th.) gives rise to a strong collocation between the verb 'anklagen' and the preposition 'wegen'; however, in the predominant sentence types of German, the two words are virtually never adjacent, because either the preposition kernel or the direct object must intervene. Therefore, we relax the adjacency condition for verb attachment and also count prepositions that occur within a fixed distance of their suspected regent.</Paragraph>
      <Paragraph position="5"> Table 1 shows the detailed values when judging the example sentence according to the unparsed corpus. The strong collocation that we would expect for 'Bedenken'+'gegen' is indeed  ment decisions.</Paragraph>
      <Paragraph position="6"> observed, with a value of 4.96. However, the verb attachment also has a score above 1, indicating that 'gegen'+'ausr&amp;quot;aumen' (to address about) are also positively correlated. This is almost certainly a misleading figure, since those two words do not form a plausible verb phrase; it is much more probable that the very strong, in fact idiomatic, correlation 'Bedenken ausr&amp;quot;aumen' (to address concerns) causes many co-occurrences of all three words. Therefore our figures falsely suggest that 'gegen' would often attach to 'ausr&amp;quot;aumen', when it is in fact the direct object of that verb that it is attracted to.</Paragraph>
      <Paragraph position="7"> (Volk, 2002) already suggested that this counting method introduced a general bias toward verb attachment, and when comparing the results for very frequent words (for which more reliable evidence is available from the treebank) we find that verb attachments are in fact systematically overestimated. We therefore adopted his approach and artificially inflated all noun+preposition counts by a constant factor i. To estimate an appropriate value for this factor, we extracted 178 instances of the standard verb+noun+preposition configuration from our corpus, of which 80 were verb attachments (V) and 98 were noun attachments (N).</Paragraph>
      <Paragraph position="8"> Table 2 shows the performance of the predictor for this binary decision task. Taken as it is, it retrieves most verb attachments, but less than half of the noun attachments, while higher values of i can improve the recall both for noun attachments and overall. The performance achieved falls somewhat short of the highest figures reported previously for PP attachment for German (Volk, 2002); this is at least in part due to our simple model that ignores the kernel noun of the PP. However, it could well be good enough to be integrated into a full parser and provide a benefit to it. Also, the syntactical configuration in this standard benchmark is not the predominant one in complete German sentences; in fact fewer than 10% of all prepositions occur in this context. The best performance on the triple task is therefore not guaranteed to be the best choice for full parsing. In our experiments, we  used a value of i = 8, which seems to be suited best to our grammar.</Paragraph>
    </Section>
    <Section position="2" start_page="226" end_page="227" type="sub_section">
      <SectionTitle>
4.2 Integration Method
</SectionTitle>
      <Paragraph position="0"> To add our simple collocation model to the parser, it is sufficient to write a single variable-strength constraint that judges each PP dependency by how strong the lexical attraction between the regent and the dependent is. The only question is how to map our lexical attraction values to penalties for this constraint. Their predicted relative order of plausibility should of course be reflected, so that dependencies with a high lexical attraction are preferred over those with lower lexical attraction. At the same time, the information should not be given too much weight compared to the existing grammar rules, since it is heuristic in nature and should certainly not override important principles such as valence or agreement. The penalties of WCDG constraints range from 0.0 (hard constraint) through 1.0 (a constraint with this penalty has no effect whatsoever and is only useful for debugging).</Paragraph>
      <Paragraph position="1"> We chose an inverse mapping based on the logarithm of lexical attraction (cf. Figure 2):</Paragraph>
      <Paragraph position="3"> where u is a normalization constant that scales the highest occurring value of LA to 1. For instance, this mapping will interpret a strong lexical attraction of 5 as the penalty 0.989 (almost perfect) and a lexical attraction of only 0.5 as the penalty 0.95 (somewhat dispreferred). The overall range of PP attachment penalties is limited to the interval [0.8 [?] 1.0], which ensures that the judgement of the statistical module will usually come into play only when no other evidence is available; preliminary experiments showed that a stronger integration of the component yields no additional advantage. In any case, the exact figure depends closely on the valuation of the existing constraints of the grammar and is of little importance as such.</Paragraph>
      <Paragraph position="4">  Besides adding the new constraint 'PP attachment' to the grammar, we also disabled several of the existing constraints that apply to prepositions, since we assume that our lexicalized model is superior to the unlexicalized assumptions that the grammar writers had made so far. For instance, the constraint mentioned in Section 3 that globally prefers verb attachment to noun attachment is essentially a crude approximation of lexical attraction, whose task is now taken over entirely by the statistical predictor. We also assume that lexical preference exerts a stronger influence on attachment than mere linear distance; therefore we changed the distance constraint so that it exempts prepositions from the normal distance penalties imposed on adjuncts.</Paragraph>
    </Section>
    <Section position="3" start_page="227" end_page="227" type="sub_section">
      <SectionTitle>
4.3 Corpus
</SectionTitle>
      <Paragraph position="0"> For our parsing experiments, we used the first 1,000 sentences of technical newscasts from the dependency treebank mentioned above. This test set has an average sentence length of 17.7 words, and from previous experiments we estimate that it is comparable in difficulty to the NEGRA corpus to within 1% of accuracy. Although online articles and newspaper copy follow some different conventions, we assume the two text types are similar enough that collocations extracted from one can be used to predict attachments in the other.</Paragraph>
      <Paragraph position="1"> For parsing we used the heuristic transformation-based search described in (Foth et al., 2000). Table 3 illustrates the structural accuracy2 of the unmodified system for various subordination types. For instance, of the 1892 dependency edges with the label 'PP' in the gold standard, 1285 are attached correctly by the parser, while 607 receive an incorrect regent. We see that PP attachment decisions are particularly prone to errors 2Note that the WCDG parser always succeeds in assigning exactly one regent to each word, so that there is no difference between precision and recall. We refer to structural accuracy as the ratio of words which have been attached correctly to all words.</Paragraph>
      <Paragraph position="2">  both in absolute and in relative terms.</Paragraph>
    </Section>
    <Section position="4" start_page="227" end_page="227" type="sub_section">
      <SectionTitle>
4.4 Results
</SectionTitle>
      <Paragraph position="0"> We trained the PP attachment predictor both with the counts acquired from the dependency treebank (supervised) and those from the newspaper corpus (unsupervised). We also tested a mode of operation that uses the more reliable data from the treebank, but backs off to unsupervised counts if the hypothetical regent was seen fewer than 1,000 times in training.</Paragraph>
      <Paragraph position="1"> Table 4 shows the results when parsing with the augmented grammar. Both the overall structural accuracy and the accuracy of PP edges are given; note that these figures result from the general subordination task, therefore they correspond to Table 3 and not to Table 2. As expected, lexicalized preference information for prepositions yields a large benefit to full parsing: the attachment error rate is decreased by 34% for prepositions, and by 14% overall. In this experiment, where much more unsupervised training data was available, supervised and unsupervised training achieved almost the same level of performance (although many individual sentences were parsed differently).</Paragraph>
      <Paragraph position="2"> A particular concern with corpus-based decision methods is their applicability beyond the training corpus. In our case, the majority of the material for supervised training was taken from the same newscast collection as the test set. However, comparable results are also achieved when applying the parser to the standard test set from the NEGRA corpus of German, as used by (Schiehlen, 2004; Foth et al., 2005): adding the PP predictor trained on our dependency treebank raises the overall attachment accuracy from 89.3% to 90.6%.</Paragraph>
      <Paragraph position="3"> This successful reuse indicates that lexical preference between prepositions and function words is largely independent of text type.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML