File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1025_metho.xml
Size: 17,009 bytes
Last Modified: 2025-10-06 14:13:07
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1025"> <Title>Probabilistic Prediction and Picky Chart Parsing*</Title> <Section position="3" start_page="128" end_page="128" type="metho"> <SectionTitle> 2. Probabilistic Models </SectionTitle> <Paragraph position="0"> The probabilistic models used in the implementation of Picky are independent of the algorithm. To facilitate the comparison between the performance of Picky and its predecessor, Pearl, the probabilistie model implemented for 'Picky is similar to Pearl's scoring model, the context-free grammar with context-sensitive probability (CFG with CSP) model. This probabilistic model estimates the probability of each parse T given the words in the sentence S, P(TIS), by assuming that each non-terminal and its immediate children are dependent on the nonterminal's siblings and parent and on the part-of-speech trigram centered at the beginning of that rule:</Paragraph> <Paragraph position="2"> where C is the non-terminal node which immediately dominates A, al is the part-of-speech associated with the leftmost word of constituent A, and a0 and a2 are the parts-of-speech of the words to the left and to the right of al, respectively. See Magerman and Marcus 1991 \[10\] for a more detailed description of the CFG with CSP model.</Paragraph> </Section> <Section position="4" start_page="128" end_page="131" type="metho"> <SectionTitle> 3. The Parsing Algorithm </SectionTitle> <Paragraph position="0"> A probabilistic language model, such as the aforementioned CFG with CSP model, provides a metric for evaluating the likelihood of a parse tree. However, while it may suggest a method for evaluating partial parse trees, a language model alone does not dictate the search strategy for determining the most likely analysis of an input.</Paragraph> <Paragraph position="1"> Since exhaustive search of the space of parse trees produced by a natural language grammar is generally not feasible, a parsing model can best take advantage of a probabilistic language model by incorporating it into a parser which probabilistically models the parsing process. Picky attempts to model the chart parsing process for context-free grammars using probabilistic prediction.</Paragraph> <Paragraph position="2"> Picky parses sentences in three phases: covered left-corner phase (I), covered bidirectional phase (II), and tree completion phase (III). Each phase uses a different method for proposing edges to be introduced to the parse chart. The first phase, covered left-corner, uses probabilistic prediction based on the left-corner word of the left-most daughter of a constituent to propose edges.</Paragraph> <Paragraph position="3"> The covered bidirectional phase also uses probabilistic prediction, but it allows prediction to occur from the left-corner word of any daughter of a constituent, and parses that constituent outward (bidirectionally) from that daughter. These phases are referred to as &quot;covered&quot; because, during these phases, the parsing mechanism proposes only edges that have non-zero probability according to the prediction model, i.e. that have been covered by the training process. The final phase, tree completion, is essentially an exhaustive search of all interpretations of the input according to the grammar.</Paragraph> <Paragraph position="4"> However, the search proceeds in best-first order, according to the measures provided by the language model.</Paragraph> <Paragraph position="5"> This phase is used only when the probabilistic prediction model fails to propose the edges necessary to complete a parse of the sentence.</Paragraph> <Paragraph position="6"> The following sections will present and motivate the prediction techniques used by the algorithm, and will then describe how they are implemented in each phase.</Paragraph> <Section position="1" start_page="128" end_page="129" type="sub_section"> <SectionTitle> 3.1. Probabilistic Prediction </SectionTitle> <Paragraph position="0"> Probabilistic prediction is a general method for using probabilistic information extracted from a parsed corpus to estimate the likelihood that predicting an edge at a certain point in the chart will lead to a correct analysis of the sentence. The Picky algorithm is not dependent on the specific probabilistic prediction model used. The model used in the implementation, which is similar to the probabilistic language model, will be described. 3 The prediction model used in the implementation of the language model used to evaluate complete analyses. However, it is helpfnl if this is the case, so that the probability estimates of incomplete edges will be consistent with the probability estimates of completed constituents.</Paragraph> <Paragraph position="1"> Picky estimates the probability that an edge proposed at a point in the chart will lead to a correct parse to be:</Paragraph> <Paragraph position="3"> where al is the part-of-speech of the left-corner word of B, a0 is the part-of-speech of the word to the left of al, and a2 is the part-of-speech of the word to the right of al.</Paragraph> <Paragraph position="4"> To illustrate how this model is used, consider the sentence null The cow raced past the barn. (3) The word &quot;cow&quot; in the word sequence &quot;the cow raced&quot; predicts NP ~ det It, but not NP ~ det n PP, since PP is unlikely to generate a verb, based on training material. 4 Assuming the prediction model is well trained, it will propose the interpretation of &quot;raced&quot; as the beginning of a participial phrase modifying &quot;the cow,&quot; as in The cow raced past the barn mooed. (4) However, the interpretation of &quot;raced&quot; as a past participle will receive a low probability estimate relative to the verb interpretation, since the prediction model only considers local context.</Paragraph> <Paragraph position="5"> The process of probabilistic prediction is analogous to that of a human parser recognizing predictive lexical items or sequences in a sentence and using these hints to restrict the search for the correct analysis of the sentence. For instance, a sentence beginning with a wh-word and auxiliary inversion is very likely to be a question, and trying to interpret it as an assertion is wasteful. If a verb is generally ditransitive, one should look for two objects to that verb instead of one or none. Using probabilistic prediction, sentences whose interpretations are highly predictable based on the trained parsing model can be analyzed with little wasted effort, generating sometimes no more than ten spurious constituents for sentences which contain between 30 and 40 constituents! Also, in some of these cases every predicted rule results in a completed constituent, indicating that the model made no incorrect predictions and was led astray only by genuine ambiguities in parts of the sentence.</Paragraph> </Section> <Section position="2" start_page="129" end_page="129" type="sub_section"> <SectionTitle> 3.2. Exhaustive Prediction </SectionTitle> <Paragraph position="0"> When probabilistic prediction fails to generate the edges necessary to complete a parse of the sentence, exhaustive prediction uses the edges which have been generated 4Throughout this discussion, we will describe the prediction process using words as the predictors of edges. In the implementation, due to sparse data concerns, only parts-of-speech are used to predict edges. Given more robust estimation techniques, a probabilistic prediction model conditioned on word sequences is likely to perform as well or better.</Paragraph> <Paragraph position="1"> in earlier phases to predict new edges which might combine with them to produce a complete parse. Exhaustive prediction is a combination of two existing types of prediction, &quot;over-the-top&quot; prediction \[9\] and top-down filtering.</Paragraph> <Paragraph position="2"> Over-the-top prediction is applied to complete edges. A completed edge A ---~ o~ will predict all edges of the form</Paragraph> <Paragraph position="4"> Top-down filtering is used to predict edges in order to complete incomplete edges. An edge of the form A ---~ c~BoB1B2fl, where a B1 has been recognized, will predict edges of the form B0 ~ 7 before B1 and edges of the form B2 ~ 8 after B1.</Paragraph> </Section> <Section position="3" start_page="129" end_page="131" type="sub_section"> <SectionTitle> 3.3. Bidirectional Parsing </SectionTitle> <Paragraph position="0"> The only difference between phases I and II is that phase II allows bidirectional parsing. Bidirectional parsing is a technique for initiating the parsing of a constituent from any point in that constituent. Chart parsing algorithms generally process constituents from left-to-right.</Paragraph> <Paragraph position="1"> For instance, given a grammar rule A ~ B1B2.-.B~, (5) a parser generally would attempt to recognize a B1, then search for a B2 following it, and so on. Bidirectional parsing recognizes an A by looking for any Bi. Once a Bi has been parsed, a bidirectional parser looks for a Bi-1 to the left of the Bi, a Bi+l to the right, and so on.</Paragraph> <Paragraph position="2"> Bidirectional parsing is generally an inefficient technique, since it allows duplicate edges to be introduced into the chart. As an example, consider a context-free rule NP ~ DET N, and assume that there is a determiner followed by a noun in the sentence being parsed. Using bidirectional parsing, this NP rule can be predicted both by the determiner and by the noun. The edge predicted by the determiner will look to the right for a noun, find one, and introduce a new edge consisting of a.completed NP. The edge predicted by the noun will look to the left for a determiner, find one, and also introduce a new edge consisting of a completed NP. Both of these NPs represent identical parse trees, and are thus redundant. If the algorithm permits both edges to be inserted into the chart, then an edge XP ~ o~ NP fl will be advanced by both NPs, creating two copies of every XP edge. These duplicate XP edges can themselves be used in other rules, and so on.</Paragraph> <Paragraph position="3"> 5In the implementation of 'Picky, over-the-top prediction for A ~ o~ will only predict edges of the form B ~ AT. This limitation on over-the-top prediction is due to the expensive bookkeeping involved in bidirectional parsing. See the section on bidirectional parsing for more details.</Paragraph> <Paragraph position="4"> To avoid this propagation of redundant edges, the parser must ensure that no duplicate edges are introduced into the chart. 'Picky does this simply by verifying every time an edge is added that the edge is not already in the chart. Although eliminating redundant edges prevents excessive inefficiency, bidirectional parsing may still perform more work than traditional left-to-right parsing. In the previous example, three edges are introduced into the chart to parse the NP ~ DET N edge. A left-to-right parser would only introduce two edges, one when the determiner is recognized, and another when the noun is recognized.</Paragraph> <Paragraph position="5"> The benefit of bidirectional parsing can be seen when probabilistic prediction is introduced into the parser.</Paragraph> <Paragraph position="6"> Frequently, the syntactic structure of a constituent is not determined by its left-corner word. For instance, in the sequence V NP PP, the prepositional phrase PP can modify either the noun phrase NP or the entire verb phrase V NP. These two interpretations require different VP rules to be predicted, but the decision about which rule to use depends on more than just the verb. The correct rule may best be predicted by knowing the preposition used in the PP. Using probabilistic prediction, the decision is made by pursuing the rule which has the highest probability according to the prediction model. This rule is then parsed bidirectionally. If this rule is in fact the correct rule to analyze the constituent, then no other predictions will be made for that constituent, and there will be no more edges produced than in left-to-right parsing. Thus, the only case where bidirectional parsing is less efficient than left-to-right parsing is when the prediction model fails to capture the elements of context of the sentence which determine its correct interpretation.</Paragraph> <Paragraph position="7"> 3.4. The Three Phases of Picky Covered Left-Corner The first phase uses probabilistic prediction based ,'n the part-of-speech sequences from the input sentence to predict all grammar rules which have a non-zero probability of being dominated by that trigram (based on the training corpus), i.e.</Paragraph> <Paragraph position="9"> where al is the part-of-speech of the left-corner word of B. In this phase, the only exception to the probabilistic prediction is that any rule which can immediately dominate the preterminal category of any word in the sentence is also predicted, regardless of its probability.</Paragraph> <Paragraph position="10"> This type of prediction is referred to as exhaustive prediction. All of the predicted rules are processed using a standard best-first agenda processing algorithm, where the highest scoring edge in the chart is advanced.</Paragraph> <Paragraph position="11"> Covered Bidirectional If an S spanning the entire word string is not recognized by the end of the first phase, the covered bidirectional phase continues the parsing process. Using the chart generated by the first phase, rules are predicted not only by the trigram centered at the left-corner word of the rule, but by the trigram centered at the left-corner word of any of the children of that rule, i.e.</Paragraph> <Paragraph position="13"> where bl is the part-of-speech associated with the left-most word of constituent B. This phase introduces incomplete theories into the chart which need to be expanded to the left and to the right, as described in the bidirectional parsing section above.</Paragraph> <Paragraph position="14"> Tree Completion If the bidirectional processing fails to produce a successful parse, then it is assumed that there is some part of the input sentence which is not covered well by the training material. In the final phase, exhaustive prediction is performed on all complete theories which were introduced in the previous phases but which are not predicted by the trigrams beneath them (i.e. P(rule \[ trigram) = 0).</Paragraph> <Paragraph position="15"> In this phase, edges are only predicted by their left-corner word. As mentioned previously, bidirectional parsing can be inefficient when the prediction model is inaccurate. Since all edges which the prediction model assigns non-zero probability have already been predicted, the model can no longer provide any information for future predictions. Thus, bidirectional parsing in this phase is very likely to be inefficient. Edges already in the chart will be parsed bidirectionally, since they were predicted by the model, but all new edges will be predicted by the left-corner word only.</Paragraph> <Paragraph position="16"> Since it is already known that the prediction model will assign a zero probability to these rules, these predictions are instead scored based on the number of words spanned by the subtree which predicted them. Thus, this phase processes longer theories by introducing rules which can advance them. Each new theory which is proposed by the parsing process is exhaustively predicted for, using the length-based scoring model.</Paragraph> <Paragraph position="17"> The final phase is used only when a sentence is so far outside of the scope of the training material that none of the previous phases are able to process it. This phase of the algorithm exhibits the worst-case exponential behavior that is found in chart parsers which do not use node packing. Since the probabilistic model is no longer useful in this phase, the parser is forced to propose an enormous number of theories. The expectation (or hope) is that one of the theories which spans most of the sen- null tence will be completed by this final process. Depending on the size of the grammar used, it may be unfeasible to allow the parser to exhaust all possible predicts before deciding an input is ungrammatical. The question of when the parser should give up is an empirical issue which will not be explored here.</Paragraph> <Paragraph position="18"> Post-processing: Partial Parsing Once the final phase has exhausted all predictions made by the grammar, or more likely, once the probability of all edges in the chart falls below a certain threshold, Picky determines the sentence to be ungrammatical. However, since the chart produced by Picky contains all recognized constituents, sorted by probability, the chart can be used to extract partial parses. As implemented, Picky prints out the most probable completed S constituent.</Paragraph> </Section> </Section> class="xml-element"></Paper>