File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/p00-1009_metho.xml

Size: 18,653 bytes

Last Modified: 2025-10-06 14:07:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1009">
  <Title>An Improved Parser for Data-Oriented Lexical-Functional Analysis</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
VP
eats
NUM SG SUBJ
TENSE PRES
PRED 'eat(SUBJ)'
</SectionTitle>
    <Paragraph position="0"> Figure 3 As with Tree-DOP, the Frontier operation then selects a set of frontier nodes and deletes all subtrees they dominate. Like Root, it also removes the ph links of the deleted nodes and erases any semantic form that corresponds to any of those nodes. For instance, if the NP in figure 1 is selected as a frontier node, Frontier erases the predicate &amp;quot;Kim&amp;quot; from the fragment:  Finally, Bod &amp; Kaplan present a third decomposition operation, Discard, defined to construct generalizations of the fragments supplied by Root and Frontier. Discard acts to delete combinations of attribute-value pairs subject to the following condition: Discard does not delete pairs whose values ph -correspond to remaining c-structure nodes. According to Bod &amp; Kaplan (1998), Discard-generated fragments are needed to parse sentences that are &amp;quot;ungrammatical with respect to the corpus&amp;quot;, thus increasing the robustness of the model.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 The composition operation
</SectionTitle>
      <Paragraph position="0"> In LFG-DOP the operation for combining fragments is carried out in two steps. First the c-structures are combined by leftmost substitution subject to the category-matching condition, as in Tree-DOP. This is followed by the recursive unification of the f-structures corresponding to the matching nodes. A derivation for an LFG-DOP representation R is a sequence of fragments the first of which is labeled with S and for which the iterative application of the composition operation produces R. For an illustration of the composition operation, see Bod &amp; Kaplan (1998).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Probability models
</SectionTitle>
      <Paragraph position="0"> As in Tree-DOP, an LFG-DOP representation R can typically be derived in many different ways. If each derivation D has a probability P(D), then the probability of deriving R is the sum of the individual derivation probabilities:</Paragraph>
      <Paragraph position="2"> An LFG-DOP derivation is produced by a stochastic process which starts by randomly choosing a fragment whose c-structure is labeled with the initial category. At each subsequent step, a next fragment is chosen at random from among the fragments that can be composed with the current subanalysis. The chosen fragment is composed with the current subanalysis to produce a new one; the process stops when an analysis results with no non-terminal leaves. We will call the set of composable fragments at a certain step in the stochastic process the competition set at that step. Let CP(f  |CS) denote the probability of choosing a fragment f from a competition set CS containing f, then the probability of a derivation D = &lt;f1, f2 ... fk&gt; is (2) P(&lt;f1, f2 ... fk&gt;) = Pi CP(fi  |CSi) where the competition probability CP(f  |CS) is expressed in terms of fragment probabilities P(f):</Paragraph>
      <Paragraph position="4"> Bod &amp; Kaplan give three definitions of increasing complexity for the competition set: the first definition groups all fragments that only satisfy the Category-matching condition of the composition operation; the second definition groups all fragments which satisfy both Category-matching and Uniqueness; and the third definition groups all fragments which satisfy Categorymatching, Uniqueness and Coherence. Bod &amp; Kaplan point out that the Completeness condition cannot be enforced at each step of the stochastic derivation process, and is a property of the final representation which can only be enforced by sampling valid representations from the output of the stochastic process. In this paper, we will only deal with the third definition of competition set, as it selects only those fragments at each derivation step that may finally result into a valid LFG representation, thus reducing the off-line validity checking to the Completeness condition.</Paragraph>
      <Paragraph position="5"> Note that the computation of the competition probability in the above formulas still requires a definition for the fragment probability P(f). Bod and Kaplan define the probability of a fragment simply as its relative frequency in the bag of all fragments generated from the corpus, just as in most Tree-DOP models. We will refer to this fragment estimator as &amp;quot;simple relative frequency&amp;quot; or &amp;quot;simple RF&amp;quot;.</Paragraph>
      <Paragraph position="6"> We will also use an alternative definition of fragment probability which is a refinement of simple RF. This alternative fragment probability definition distinguishes between fragments supplied by Root/Frontier and fragments supplied by Discard. We will treat the first type of fragments as seen events, and the second type of fragments as previously unseen events. We thus create two separate bags corresponding to two separate distributions: a bag with fragments generated by Root and Frontier, and a bag with fragments generated by Discard. We assign probability mass to the fragments of each bag by means of discounting: the relative frequencies of seen events are discounted and the gained probability mass is reserved for the bag of unseen events (cf. Ney et al. 1997). We accomplish this by a very simple estimator: the Turing-Good estimator (Good 1953) which computes the probability mass of unseen events as n1/N where n1 is the number of singleton events and N is the total number of seen events. This probability mass is assigned to the bag of Discard-generated fragments. The remaining mass (1 [?] n1/N) is assigned to the bag of Root/Frontier-generated fragments. The probability of each fragment is then computed as its relative frequency in its bag multiplied by the probability mass assigned to this bag. Let  |f  |denote the frequency of a fragment f, then its probability is given by:  (4) P(f  |f is generated by Root/Frontier) = (5) P(f  |f is generated by Discard) = (n1/N)  |f |  Sf': f' is generated by Discard  |f'| We will refer to this fragment probability estimator as &amp;quot;discounted relative frequency&amp;quot; or &amp;quot;discounted RF&amp;quot;.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Parsing with LFG-DOP
</SectionTitle>
    <Paragraph position="0"> In his PhD-thesis, Cormons (1999) presents a parsing algorithm for LFG-DOP which is based on the Tree-DOP parsing technique described in Bod (1998). Cormons first converts LFGrepresentations into more compact indexed trees: each node in the c-structure is assigned an index which refers to the ph-corresponding f-structure unit. For example, the representation in figure 1 is indexed as</Paragraph>
    <Paragraph position="2"> The indexed trees are then fragmented by applying the Tree-DOP decomposition operations described in section 2. Next, the LFG-DOP decomposition operations Root, Frontier and Discard are applied to the f-structure units that correspond to the indices in the c-structure subtrees. Having obtained the set of LFG-DOP fragments in this way, each test sentence is parsed by a bottom-up chart parser using initially the indexed subtrees only.</Paragraph>
    <Paragraph position="3"> Thus only the Category-matching condition is enforced during the chart-parsing process. The Uniqueness and Coherence conditions of the corresponding f-structure units are enforced during the disambiguation or chart decoding process. Disambiguation is accomplished by computing a large number of random derivations from the chart and by selecting the analysis which results most often from these derivations. This technique is known as &amp;quot;Monte Carlo disambiguation&amp;quot; and has been extensively described in the literature (e.g. Bod 1993, 1998; Chappelier &amp; Rajman 2000; Goodman 1998; Hoogweg 2000). Sampling a random derivation from the chart consists of choosing at random one of the fragments from the set of composable fragments at every labeled chart-entry (where the random choices at each chart-entry are based on the probabilities of the fragments). The derivations are sampled in a topdown, leftmost order so as to maintain the LFG-DOP derivation order. Thus the competition sets of composable fragments are computed on the fly during the Monte Carlo sampling process by grouping the f-structure units that unify and that are coherent with the subderivation built so far. As mentioned in section 3, the Completeness condition can only be checked after the derivation process. Incomplete derivations are simply removed from the sampling distribution.</Paragraph>
    <Paragraph position="4"> After sampling a sufficiently large number of random derivations that satisfy the LFG validity requirements, the most probable analysis is estimated by the analysis which results most often from the sampled derivations. As a stop condition on the number of sampled derivations, we compute the probability of error, which is the probability that the analysis that is most frequently generated by the sampled derivations is not equal to the most probable analysis, and which is set to 0.05 (see Bod 1998). In order to rule out the possibility that the sampling process never stops, we use a maximum sample size of 10,000 derivations.</Paragraph>
    <Paragraph position="5"> While the Monte Carlo disambiguation technique converges provably to the most probable analysis, it is quite inefficient. It is possible to use an alternative, heuristic search based on Viterbi n best (we will not go into the PCFG-reduction technique presented in Goodman (1998) since that heuristic only works for Tree-DOP and is beneficial only if all subtrees are taken into account and if the so-called &amp;quot;labeled recall parse&amp;quot; is computed). A Viterbi n best search for LFG-DOP estimates the most probable analysis by computing n most probable derivations, and by then summing up the probabilities of the valid derivations that produce the same analysis. The algorithm for computing n most probable derivations follows straight-forwardly from the algorithm which computes the most probable derivation by means of Viterbi optimization (see e.g. Sima'an 1999).</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Experimental Evaluation
</SectionTitle>
    <Paragraph position="0"> We derived some experimental properties of LFG-DOP by studying its behavior on the two LFG-annotated corpora that are currently available: the Verbmobil corpus and the Homecentre corpus. Both corpora were annotated at Xerox PARC. They contain packed LFGrepresentations (Maxwell &amp; Kaplan 1991) of the grammatical parses of each sentence together with an indication which of these parses is the correct one. For our experiments we only used the correct parses of each sentence resulting in 540 Verbmobil parses and 980 Homecentre parses.</Paragraph>
    <Paragraph position="1"> Each corpus was divided into a 90% training set and a 10% test set. This division was random except for one constraint: that all the words in the test set actually occurred in the training set. The sentences from the test set were parsed and disambiguated by means of the fragments from the training set. Due to memory limitations, we restricted the maximum depth of the indexed subtrees to 4. Because of the small size of the corpora we averaged our results on 10 different training/test set splits. Besides an exact match accuracy metric, we also used a more fine-grained score based on the well-known PARSEVAL metrics that evaluate phrase-structure trees (Black et al. 1991). The PARSEVAL metrics compare a proposed parse P with the corresponding correct treebank parse T as follows:</Paragraph>
    <Paragraph position="3"> A constituent in P is correct if there exists a constituent in T of the same label that spans the same words and that ph-corresponds to the same f-structure unit (see Bod 2000c for some illustrations of these metrics for LFG-DOP).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Comparing the two fragment estimators
</SectionTitle>
      <Paragraph position="0"> We were first interested in comparing the performance of the simple RF estimator against the discounted RF estimator. Furthermore, we want to study the contribution of generalized fragments to the parse accuracy. We therefore created for each training set two sets of fragments: one which contains all fragments (up to depth 4) and one which excludes the generalized fragments as generated by Discard. The exclusion of these Discard-generated fragments means that all probability mass goes to the fragments generated by Root and Frontier in which case the two estimators are equivalent. The following two tables present the results of our experiments where +Discard refers to the full set of fragments and [?]Discard refers to the fragment set without Discard-generated fragments.</Paragraph>
      <Paragraph position="1">  The tables show that the simple RF estimator scores extremely bad if all fragments are used: the exact match is only 1.1% on the Verbmobil corpus and 2.7% on the Homecentre corpus, whereas the discounted RF estimator scores respectively 35.9% and 38.4% on these corpora.</Paragraph>
      <Paragraph position="2"> Also the more fine-grained precision and recall scores obtained with the simple RF estimator are quite low: e.g. 13.8% and 11.5% on the Verbmobil corpus, where the discounted RF estimator obtains 77.5% and 76.4%. Interestingly, the accuracy of the simple RF estimator is much higher if Discard-generated fragments are excluded. This suggests that treating generalized fragments probabilistically in the same way as ungeneralized fragments is harmful.</Paragraph>
      <Paragraph position="3"> The tables also show that the inclusion of Discard-generated fragments leads only to a slight accuracy increase under the discounted RF estimator. Unfortunately, according to paired ttesting only the differences for the precision scores on the Homecentre corpus were statistically significant.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Comparing different fragment sizes
</SectionTitle>
      <Paragraph position="0"> We were also interested in the impact of fragment size on the parse accuracy. We therefore performed a series of experiments where the fragment set is restricted to fragments of a certain maximum depth (where the depth of a fragment is defined as the longest path from root to leaf of its c-structure unit). We used the same training/test set splits as in the previous experiments and used both ungeneralized and generalized fragments together with the discounted RF estimator.</Paragraph>
      <Paragraph position="1">  Tables 3 and 4 show that there is a consistent increase in parse accuracy for all metrics if larger fragments are included, but that the increase itself decreases. This phenomenon is also known as the DOP hypothesis (Bod 1998), and has been confirmed for Tree-DOP on the ATIS, OVIS and Wall Street Journal treebanks (see Bod 1993, 1998, 1999, 2000a; Sima'an 1999; Bonnema et al. 1997; Hoogweg 2000). The current result thus extends the validity of the DOP hypothesis to LFG annotations. We do not yet know whether the accuracy continues to increase if even larger fragments are included (for Tree-DOP it has been shown that the accuracy decreases after a certain depth, probably due to overfitting -- cf. Bonnema et al. 1997; Bod 2000a).</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.3 Comparing LFG-DOP to Tree-DOP
</SectionTitle>
      <Paragraph position="0"> In the following experiment, we are interested in the impact of functional structures on predicting the correct tree structures. We therefore removed all f-structure units from the fragments, thus yielding a Tree-DOP model, and compared the results against the full LFG-DOP model (using the discounted RF estimator and all fragments up to depth 4). We evaluated the parse accuracy on the tree structures only, using exact match together with the standard PARSEVAL measures. We used the same training/test set splits as in the previous experiments.</Paragraph>
      <Paragraph position="1">  The results indicate that LFG-DOP's functional structures help to improve the parse accuracy of tree structures. In other words, LFG-DOP outperforms Tree-DOP if evaluated on tree structures only. According to paired t-tests all differences in accuracy were statistically significant. This result is promising since Tree-DOP has been shown to obtain state-of-the-art performance on the Wall Street Journal corpus (see Bod 2000a).</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.4 Comparing Viterbi n best to Monte Carlo
</SectionTitle>
      <Paragraph position="0"> Finally, we were interested in comparing an alternative, more efficient search method for estimating the most probable analysis. In the following set of experiments we use a Viterbi n best search heuristic (as explained in section 4), and let n range from 1 to 10,000 derivations. We also compute the results obtained by Monte Carlo for the same number of derivations. We used the same training/test set splits as in the previous experiments and used both ungeneralized and generalized fragments up to depth 4 together with the discounted RF estimator.</Paragraph>
      <Paragraph position="1"> Nr. of derivations Viterbi n best Monte Carlo  The tables show that Viterbi n best already achieves a maximum accuracy at 100 derivations (at least on the Verbmobil corpus) while Monte Carlo needs a much larger number of derivations to obtain these results. On the Homecentre corpus, Monte Carlo slightly outperforms Viterbi n best at 10,000 derivations, but these differences are not statistically significant. Also remarkable are the relatively high results obtained with Viterbi n best if only one derivation is used. This score corresponds to the analysis generated by the most probable (valid) derivation. Thus Viterbi n best is a promising alternative to Monte Carlo resulting in a speed up of about two orders of magnitude.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML