File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1014_metho.xml

Size: 11,016 bytes

Last Modified: 2025-10-06 14:08:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1014">
  <Title>Fast LR Parsing Using Rich (Tree Adjoining) Grammars</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Conflict Resolution
</SectionTitle>
    <Paragraph position="0"> In this session we focus on how to resolve conflicts in the generated parsing table to obtain a single 3Notice that the notion of top-down/bottom-up parsing cannot be defined on the derived tree for TAGs, unless one wants to break the actions into independent subatomic units, e.g., single-level context-free-like expansions.</Paragraph>
    <Paragraph position="1"> stack input sel. action</Paragraph>
    <Paragraph position="3"> (parsing states were omitted in &amp;quot;stack&amp;quot; for clarity).</Paragraph>
    <Paragraph position="4"> &amp;quot;best&amp;quot; parse for each input sentence. At each step of the parsing process the driver is faced with the task of choosing among a certain number of available actions. At the end, the sequence of actions taken will uniquely define one derivation tree that represents the chosen syntactic analysis.</Paragraph>
    <Paragraph position="5"> In our approach, the parser proceeds greedily trying to compute a single successful sequence of actions. Whenever it fails, a backtracking strategy is employed to re-conduce the parser to an alternative path, up to a certain limited number of attempts provided as a parameter to the parser. Choices are made locally. We have not tried to maximize any global measure, such as the probability of the sentence, the probability of a parse given a string, etc.</Paragraph>
    <Paragraph position="6"> An instantaneous description (or &amp;quot;configuration&amp;quot;) can be characterized by two components:  1. The current content of the stack, which includes the current (automaton) state; 2. The suffix of the input not yet shifted into the stack.</Paragraph>
    <Paragraph position="7"> The basic parsing approach has two main components: null 1. A strategy for ranking the actions available at any given instantaneous description; 2. A greedy strategy for computing the sequence of parsing actions. At each instantaneous description: null a10 Choose the highest-ranked action that has not been tried yet and execute it.</Paragraph>
    <Paragraph position="8"> a10 If there is no action available, then back- null track: move back to one of the instantaneous descriptions previously entered and choose an alternative action.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Strategies for ranking conflicting actions
</SectionTitle>
      <Paragraph position="0"> Let a0a2a1a4a3 be the number of positions in the stack and a5a7a6a9a8a10a5a12a11a13a8a13a14a13a14a13a14a15a8a10a5a7a16a18a17a20a19a9a8a10a5a13a16a21a17a22a11a7a8a10a5a13a16 be the sequence of states in the positions.4 a5a13a16 is the current state. Let a23a25a24 be the lookahead symbol, the leftmost symbol in the not yet shifted suffix. Let a26 be the (finite) set of possible actions given by the grammar.</Paragraph>
      <Paragraph position="1"> We use two basic ranking functions: a27a28a24a30a29 a0 a11 and</Paragraph>
      <Paragraph position="3"> rent automaton state (the state at the top of the stack, a5a13a16 ) and the lookahead symbol, a23a25a24 . It is a poor statistic but it does not suffer of sparse data problems in our training corpus, and hence is used for smoothing. For any instantaneous description as described above, we trivially define the a27a28a24a31a29 a0 a11a15a32 a24a34a33 , for any action a24a36a35a37a26 , as a probability estimate for a24 , given the current state a5a7a16 and lookahead symbol a23a25a24 :</Paragraph>
      <Paragraph position="5"> a23a55a54 occurs in an instantaneous description when parsing the annotated corpus.</Paragraph>
      <Paragraph position="6"> It can be observed that individual actions tend to depend on different additional states in the stack. For a shift action there is no reason to assume that the previous state is particularly relevant, or that state, say, a5 a16a21a17a20a19 , is not. But for a a27a57a54a7a58 a47a59a44 a54 or a60 a39 a24 a44 a0 action we should suspect that the state from where the a61 a45 a48 a45 action is taken is highly relevant. So, for instance, the action a62a63a51 reduce a48 , where the number of non-empty leaves of a48 is a23 , would have strong dependency on the state a5 a16a21a17a20a64 . An approximation of its rank would be: a27a28a24a31a29 a0 a32 a24a34a33 a1 a38a39a65a32 a24a66a41a5 a16a21a17a20a64 a8a10a5a13a16a42a8 a23a25a24a34a33 . This observation is certainly not new. A similar ranking function is in fact used by Briscoe and Carroll. However, an inconsistency immediately arises: we cannot compare probabilities of events drawn on distinct sample spaces. For instance, if we have two 4Recall each position in the stack contains a pair where the second element is an automaton state and the first element is either a symbol or an embedded stack.</Paragraph>
      <Paragraph position="7"> competing actions a24 a11 , an a62a67a51 reduce a48 , and a24 a19 , a shift, and we affirm that a24 a19 depends on a5 a16a21a17a20a64 , then, it cannot be true that the shift does not depend on a5 a16a21a17a20a64 . In fact, it has to be the case that it depends on a5 a16a21a17a20a64 as much as a24 a11 . One could suggest calculating the probabilities for all actions conditional to the same set of states, a68 a5 a16a21a17a20a64 a8a10a5a13a16a70a69 . But, in general, we have many more than two actions to decide among. And they are likely to stress their dependencies on different past states. We see that this is not going to work; the number of dependencies, and hence the number of parameters, will grow too big.</Paragraph>
      <Paragraph position="8"> A striking solution arises from a notable fact from LR parsing theory for CFGs: If state a5 a16 contains an action reduce p, where a39 is a production with a23 symbols on its right side, then, the pair (a5 a16a21a17a20a64 a8a10a5a13a16 ), from the instantaneous description, uniquely identifies the entire sequence a5 a16a21a17a20a64 a8a10a5 a16a18a17a20a64a72a71a65a11 a8a13a14a73a14a73a14a73a8a10a5a13a16a21a17a22a11a7a8a10a5a13a16 . Although this property does not hold for the parser generation algorithm we are using, it is still a good approximation to the true statistical dependencies.5 We can use this &amp;quot;approximately correct&amp;quot; property in our benefit: &amp;quot;if state a5 a16 contains an action reduce or bpack for a number of leaves a23 , then the dependency on the sequence a5 a16a21a17a20a64 a8a10a5 a16a21a17a20a64a74a71a65a11 a8a13a14a73a14a73a14a73a8a10a5a13a16a21a17a22a11a15a8a10a5a13a16 can be approximated by a dependency on the pair (a5 a16a21a17a20a64 a8a10a5a13a16 )&amp;quot;. So a natural candidate for the second state to be considered is the state a5 a16a18a17a76a75a77a64a73a78a72a16a73a79 , where</Paragraph>
      <Paragraph position="10"> is defined in (Prolo, 2000), as a function of two states (instead of just a simple transition in the automaton). A detailed argument is beyond the scope of this paper and can be found in (Prolo, 2002), available upon request to the author. That the statement is a good approximation to the true statistical dependencies follows from: (1) adjuncts (that can cause distinct states to intervene between the considered pair), are generally regarded as not restricting the syntactic possibilities of the clause they adjoin to; and (2) in practice, the intermediate states at positions that could be distinct for theoretically different sequences most often have exactly the same characteristics, i.e., they are likely to &amp;quot;accidentally&amp;quot; collapse to the same state.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 The backtracking strategy
</SectionTitle>
      <Paragraph position="0"> We have a (quite narrow) notion of confidence for parsing paths: as long as our sequence of decisions allows the parser to proceed we trust the sequence, and if it has taken us to an acceptance state, we believe we have the correct parse. On the other hand, a crash is our other binary value for confidence, an untrustworthy parsing sequence. In these cases, we know we have made a bad decision somewhere in the path and that we have to start again from that point by following another alternative. This is a backtracking strategy, although not in the common sense of a depth-first walk, (i.e., exploring all the possibilities left before undoing some earlier action).</Paragraph>
      <Paragraph position="1"> We want to explore strategies of intelligently guessing the restart point.</Paragraph>
      <Paragraph position="2"> We use a simple strategy of returning to the decision point that left the highest amount of probability mass unexplored. In order to implement it, we maintain a tree with the entire parsing attempts' history. There will be one path from the root corresponding to the current parsing attempt, the leaf being the current instantaneous description. All other leaves correspond to instantaneous descriptions that have been abandoned (crashing points). If the current leaf crashes, all nodes in the tree (except for the leaves) compete to be the restart point. Keeping all nodes in the tree alive is a direct consequence of the fact that we do not intend to do exhaustive backtracking.</Paragraph>
      <Paragraph position="3"> We trade space (a tree instead of just a sequence) for time: presumably, by doing smart backtracking we can find a successful path by trying only a fraction of the possible ones. Moreover, we want to find the best (or approximately best) successful path, and a crashing point is a good point to re-evaluate the process. Limits may be added through parameters, so that the parser may give up after a certain amount of attempts or time.</Paragraph>
      <Paragraph position="4"> In addition to the instantaneous description, each node contains a record of the alternatives previously tried (the edges to its child nodes in the tree) with their corresponding probabilities, plus a ranked list of the alternatives not yet tried. In particular we maintain the probability mass left unexplored in a node: the sum of the probabilities of the actions not yet tried. Notice that alternatives already tried are indirectly kept alive through their corresponding child nodes.</Paragraph>
      <Paragraph position="5"> Let a0 a32 a29a65a33 be the set of actions not yet tried at node a29 . The probability mass left is a39 a23 a32 a29a65a33 a1</Paragraph>
      <Paragraph position="7"> chooses a24 a35a10a0 a32 a29a65a33 for which a27a28a24a31a29 a0 a32 a24 a8 a29a65a33 is maximum (efficiently maintained using a priority queue).</Paragraph>
      <Paragraph position="8"> Then we update a39 a23 a32 a29a65a33 a1 a39 a23 a32 a29a65a33 a51 a27a28a24a30a29 a0 a32 a24 a8 a29a65a33 and start another branch in the tree by executing a24 .</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML