File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1075_metho.xml

Size: 12,534 bytes

Last Modified: 2025-10-06 14:07:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1075">
  <Title>A Novel Disambiguation Method For Unification-Based Grammars Using Probabilistic Context-Free Approximations</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Context-Free Approximation
</SectionTitle>
    <Paragraph position="0"> In this section, we briefly review a simple and intuitive approximation method for turning unification-based grammars, such as HPSG (Pollard and Sag,  illustrates that (i) each UBG reading of the sentence is associated with a non-empty set of syntax trees according to the CFG approximation, and (ii) that the sentence may have CFG trees, which can not be replayed by the UBG, since the CFG overgenerates (or at best is a correct approximation of the UBG).</Paragraph>
    <Paragraph position="1"> 1994) or PATR-II (Shieber, 1985) into context-free grammars (CFG). The method was introduced by Kiefer and Krieger (2000).</Paragraph>
    <Paragraph position="2"> The approximation method can be seen as the construction of the least fixpoint of a certain monotonic function and shares similarities with the instantiation of rules in a bottom-up passive chart parser or with partial evaluation in logic programming. The basic idea of the approach is as follows.</Paragraph>
    <Paragraph position="3"> In a first step, one generalizes the set of all lexicon entries. The resulting structures form equivalence classes, since they abstract from word-specific information, such as FORM or STEM. The abstraction is specified by means of a restrictor (Shieber, 1985), the so-called lexicon restrictor. After that, the grammar rules are instantiated by unification, using the abstracted lexicon entries and resulting in derivation trees of depth 1. The rule restrictor is applied to each resulting feature structure (FS), removing all information contained only in the daughters of a rule. Additionally, the restriction gets rid of information that will either lead to infinite growth of the FSs or that does not constrain the search space. The restricted FSs (together with older ones) then serve as the basis for the next instantiation step. Again, this gives FSs encoding a derivation, and again the rule restrictor is applied. This process is iterated until a fixpoint is reached, meaning that further iteration steps will not add (or remove) new (or old) FSs to the set of computed FSs.</Paragraph>
    <Paragraph position="4"> Given the FSs from the fixpoint, it is then easy to generate context-free productions, using the complete FSs as symbols of the CFG; see Kiefer and Krieger (2002). We note here that adding (and perhaps removing) FSs during the iteration can be achieved in different ways: either by employing feature structure equivalence a18 (structural equivalence) or by using FS subsumption a19 . It is clear that the resulting CFGs will behave differently (see figure 4). An in-depth description of the method, containing lots of details, plus a mathematical underpinning is presented in (Kiefer and Krieger, 2000) and (Kiefer and Krieger, 2002). The application of the method to a mid-size UBG of English, and largesize HPSGs of English and Japanese is described in (Kiefer and Krieger, 2002) and (Kiefer et al., 2000).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="89" type="metho">
    <SectionTitle>
3 A Novel Disambiguation for UBGs
</SectionTitle>
    <Paragraph position="0"> (Kiefer and Krieger, 2000) suggest that, given a UBG, the approximated CFG can be used as a cheap filter during a two-stage parsing approach. The idea is to let the CFG explore the search space, whereas the UBG deterministically replays the derivations, proposed by the CFG. To be able to carry out the replay, during the creation of the CF grammar, each CF production is correlated with the UBG rules it was produced from.</Paragraph>
    <Paragraph position="1"> The above mentioned two-stage parsing approach not only speeds up parsing (see figure 4), but can also be a starting point for an efficient stochastic parsing model, even though the UBG might encode an infinite number of categories. Given a training corpus, the idea is to move from the approximated CFG to a PCFG which predicts probabilities for the CFG trees. Clearly, the probabilities can be used for disambiguation, and more important, for ranking of CFG trees. The idea is, that the ranked parsing trees can be replayed one after another by the UBG (processing the most probable CFG trees first), establishing an order of best UBG parsing trees. Since the approximation always yields a CFG that is a superset of the UBG, it might be possible that derivation trees proposed by the PCFG can not be replayed by the UBG. Nevertheless, this behavior does not alter the ranking of reconstructed UBG parsing trees.</Paragraph>
    <Paragraph position="2">  grammar. Note that the vertical dots at the top indicate an incomplete FS derivation tree. Furthermore, the FSs at the tree nodes are massively simplified.</Paragraph>
    <Paragraph position="3"> of a sentence, analyzed by a UBG and its CFG approximation. Using this figure, it should be clear that a ranking of CFG trees induces a ranking of UBG readings, even if not all CFG trees have an associated UBG reading. We exemplify our idea in section 4, where we disambiguate a sentence with a PP-attachment ambiguity.</Paragraph>
    <Paragraph position="4"> As a nice side effect, our proposed stochastic parsing model should usually not explore the full search space, nor should it statically estimate the parsing results afterwards, assuming we are interested in the most probable parse (or say, the two most probable results)--the disambiguation of UBG results is simply established by the dynamic ordering of most probable CFG trees during the first parsing stage.</Paragraph>
    <Paragraph position="5"> measure</Paragraph>
  </Section>
  <Section position="5" start_page="89" end_page="89" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> Approximation. (Dowding et al., 2001) compared (Moore, 1999)'s approach to grammar approximation to (Kiefer and Krieger, 2000). As a basis for the comparison, they chose an English grammar written in the Gemini/CLE formalism. The motivation for this enterprise comes from the use of the resulting CFG as a context-free language model for the Nuance speech recognizer. John Dowding kindly provided the Gemini grammar and a corpus of 500 sentences, allowing us to measure the quality of our approximation method for a realistic mid-size grammar, both under a18 and a19 (see section 2).1 The Gemini grammar consisted of 57 unification rules and a small lexicon of 216 entries which expanded into 425 full forms. Since the grammar allows for atomic disjunctions (and makes heavy use of them), we ended in overall 1,886 type definitions in our system. Given the 500 sentences, the Gemini grammar licensed 720 readings. We only deleted the ARGS feature (the daughters) during the iteration and found that the original UBG encodes a context-free language, due to the fact that the iteration terminates under a18 . This means that we have even obtained a correct approximation of the Gemini grammar. Table 4 presents the relevant numbers, both under a18 and a19 , and shows that the ambiguity rate for a19 goes up only mildly.</Paragraph>
    <Paragraph position="1"> We note, however, that these numbers differ from those presented in (Dowding et al., 2001). We could not find out why their implementation produces worse results than ours. They suggested that the use of a19 is the reason for the bad behaviour of the resulting grammar, but, as our figures show, this is not  rived under a5 and a6 . The fixpoint for a5 (a6 ) was reached after 9 (8) iteration steps and took 5 minutes (34 seconds) to be computed, incl. post-processing time to compute the CF productions. The run time speed-up for two-stage parsing is given in the last row. The measurements were conducted on a 833 MHz Linux workstation.</Paragraph>
    <Paragraph position="2"> true, at least not for this grammar. Of course, using a19 instead of a18 can lead to substantially less restrictive grammars, but when dealing with complex grammars, there is--at the moment--no alternative to using a19 due to massive space and time requirements of the approximation process.</Paragraph>
    <Paragraph position="3"> Figure 2 displays one of the two readings for the sentence measure temperature at all three decks, analyzed by the Gemini grammar. The sentence is one of the 500 sentences provided by John Dowding.</Paragraph>
    <Paragraph position="4"> The vertical dots simply indicate that some less relevant nodes of the FS derivation tree have been omitted. The figure shows the reading, where the PP at all three decks is attached to the NP temperature.</Paragraph>
    <Paragraph position="5"> Due to space constraints, we do not show the second reading, where the PP is attached to the VP measure temperature.</Paragraph>
    <Paragraph position="6"> Figure 3 shows the two syntax trees for the sentence, analyzed by the context-free approximation of the Gemini grammar, obtained by using a19 . It  thesis are probabilities for grammar rules, gathered after two training iterations with the inside-outside algorithm. is worth noting that both readings of the CFG approximation differ in PP attachment, in the same manner as the readings analyzed by the UBG itself. In the figure, all non-terminals are simply displayed as numbers, but each number represents a fairly complex feature structure, which is, in general, slightly less informative than an associated tree node of a possible FS derivation tree of the given Gemini grammar for two reasons. Firstly, the use of the a19 operation as a test generalizes information during the approximation process. In a more complex UBG grammar, the restrictors would have deleted even more information. Secondly, the flow of information in a local tree from the mother to the daughter node will not be reflected because the approximation process works strictly bottom up from the lexicon entries.</Paragraph>
    <Paragraph position="7"> Training of the CFG approximation. A sample of sentences serves as input to the inside-outside algorithm, the standard algorithm for unsupervised training of PCFGs (Lari and Young, 1990). The given corpus of 500 sentences was divided into a training corpus (90%, i.e., 450 sentences) and a testing corpus (10%, i.e., 50 sentences). This standard procedure enables us (i) to apply the inside-outside algorithm to the training corpus, and (ii) to evaluate the resulting probabilistic context-free grammars on the testing corpus. We linguistically evaluated the maximum-probability parses of all sentences in the testing corpus (see section 5). For unsupervised training and parsing, we used the implementation of Schmid (1999).</Paragraph>
    <Paragraph position="8"> Figure 5 shows a fragment of the probabilistic context-free approximation. The probabilities of the grammar rules are extracted after several training iterations with the inside-outside algorithm using the training corpus of 450 sentences.</Paragraph>
    <Paragraph position="9"> Disambiguation using maximum-probability parses. In contrast to most approaches to stochastic modeling of UBGs, PCFGs can be very easily used to assign probabilities to the readings of a given sentence: the probability of a syntax tree (the reading) is the product of the probabilities of all context-free rules occurring in the tree.</Paragraph>
    <Paragraph position="10"> For example, the two readings of the sentence measure temperature at all three decks, as displayed in figure 3, have the following probabilities: a1a3a2a4a1a6a5a8a7a10a9a12a11a3a13a15a14a17a16 (first reading on the left-hand side) and a9a18a2a20a19a22a21a23a7a3a9a12a11a24a13a15a14a17a25 (second reading on the right-hand side). The maximum-probability parse is therefore the syntax-tree on the left-hand side of figure 3, which is the reading, where the PP at all three decks is attached to the NP temperature.</Paragraph>
    <Paragraph position="11"> A closer look on the PCFG fragment shows that the main contribution to this result comes from the two rules 929 a0 1028 951 (0.938) and 183 a0 960 951 (0.042). Here, the probabilities encode the statistical finding that PP-to-NP attachments can be expected more frequently than PP-to-VP attachments, if the context-free approximation of the Gemini grammar is used to analyze the given corpus of 500 sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML