File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1034_metho.xml
Size: 19,157 bytes
Last Modified: 2025-10-06 14:14:20
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1034"> <Title>Efficient Transformation-Based Parsing</Title> <Section position="4" start_page="256" end_page="257" type="metho"> <SectionTitle> 3 Rule representation </SectionTitle> <Paragraph position="0"> We develop here a representation of rule sequences that makes use of DTA and that is at the basis of the main result of this paper. Our technique improves the preprocessing phase of a bottom-up tree pattern matching algorithm presented in (Hoffmann and O'Donnell, 1982), as it will be discussed in the final section.</Paragraph> <Paragraph position="1"> Let G = (~,R) be a TTS, R = (ri,r2,...,r~). In what follows we construct a DTA that &quot;detects&quot; each subtree of an input tree that is equivalent to some tree in lhs(_R). We need to introduce some additional notation. Let N be the set of all nodes from the trees in lhs(R). Call Nr the set of all root nodes (in N), N,~ the set of all leftmost nodes, Nz the set of all leaf nodes, and Na the set of all nodes labeled by a E ~.</Paragraph> <Paragraph position="2"> For each q E 2 N, let right(q) = {n I n E N, n' E q, n has immediate left sibling n'} and let up(q) = {n \[ n E N, n' E q, nhasrightmostchildn'}.</Paragraph> <Paragraph position="3"> Also, let q0 be a fresh symbol.</Paragraph> <Paragraph position="4"> Definition 3 G is associated with a DTA Aa = (2 N U {q0}, E, 6a, qo, F), where F = {q \[ q E 2 N, (q f3 Nr) # 0} and 6G is specified as follows: (i) 5a(qo,qo,a) = No M Nm ANt; (it) dia(qo,q',a) = NaANmA(NtUup(q')), forq' # qo; (iii) diG(q, qo, a) = Na A Nz t\] (Nr U right(q)), for q qo; (iv) 6a(q, q', a) = No M up(q') A (Nr U right(q)), for q qo # q'.</Paragraph> <Paragraph position="5"> Observe that each state of Ac simultaneously carries over the recognition of several suffixes of trees in lhs(/~). These processes are started whenever Ac reads a leftmost node n with the same label as a leftmost leaf node in some tree in lhs(R) (items (i) and (ii) in Definition 3). Note also that we do not require any matching of the left siblings when we match the root of a tree in lhs(R) (items (iii) and (iv)).</Paragraph> <Paragraph position="6"> Example 4 Let G = (E,R), where E = {A, B, C, D} and R = (rl,r2, r3). Rules ri are depicted in Figure 2. We write nij to denote the j-th node * in a post-order enumeration of the nodes of lhs(ri), 1 < i < 3 and 1 < j <__ 5. (Therefore n35 denotes the root node of lhs(r3) and n22 denotes the first child of the second child of the root node of lhs(r~).) If we consider only the useful states, that is those states that can be reached on an actual input, the DTA Ac --- (Q, E, 5, qo, F), is specified as follows: Q = {qi I 0 < i < I1}, where ql = {nll,n12, n22, n32},</Paragraph> <Paragraph position="8"> transition function 5, restricted to the useful states, is specified in Figure 3. Note that among the 215 + 1 possible states, only 12 are useful. \[\]</Paragraph> <Paragraph position="10"> Q2x E not indicated above, 5(q, q', a) = qll-Although the number of states of Ac is exponential in IN I, in practical cases most of these states are never reached by the automaton on an actual input, and can therefore be ignored. This happens whenever there are few pairs of suffix trees of trees in lhs(R) that share a common prefix tree but no tree in the pair matches the other at the root node.</Paragraph> <Paragraph position="11"> This is discussed at length in (Hoffmann and O'Donnell, 1982), where an upper bound on the number of useful states is provided.</Paragraph> <Paragraph position="12"> The following lemma provides a characterization of Aa that will be used later.</Paragraph> <Paragraph position="13"> Lemma 1 Let n be a node ofT E ~T and let n ~ be the roof node of r E R. Tree lhs(r) matches Taf n if and only if n' E iG(T,n).</Paragraph> <Paragraph position="14"> Proof (outline). The statement can be shown by proving the following claim. Let m be a node in T and m t be a node in lhs(r). Call ml,...,m~ = m, k > 1, the ordered sequence of the left siblings of m, with m included, and call m~,..., m' k, -&quot; m', k' > 1, the ordered sequence of the left siblings of m ~, with m' included. If m' ~ Nr, then the two following conditions are equivalent:</Paragraph> <Paragraph position="16"> The claim can be shown by induction on the position of m ~ in a post-order enumeration of the nodes of lhs(r). The lemma then follows from the specification of set F and the treatment of set N~ in items (iii) and (iv) in Definition 3. \[\] We also need a function mapping F x {1..(r + 1)} into {1..r} U {.1_}, specified as (min@ =_1_):</Paragraph> <Paragraph position="18"> root node in q}. (5) Assume that q E F is reached by AG upon reading a node n (in some tree). In the next section next(q, i) is used to select the index of the rule that should be next applied at node n, after the first i - 1 rules of R have been considered.</Paragraph> </Section> <Section position="5" start_page="257" end_page="260" type="metho"> <SectionTitle> 4 The algorithm </SectionTitle> <Paragraph position="0"> We present a translation algorithm for TTS that can immediately be converted into a transformation-based parsing algorithm. We use all definitions introduced in the previous sections. To simplify the presentation, we first make the assumption that the order in which we apply several instances of the same rule to a given tree does not affect the outcome.</Paragraph> <Paragraph position="1"> Later we will deal with the general case.</Paragraph> <Section position="1" start_page="257" end_page="259" type="sub_section"> <SectionTitle> 4.1 Order-free case </SectionTitle> <Paragraph position="0"> We start with an important property that is used by the algorithm below and that can be easily shown (see also (Hoffmann and O'Donnell, 1982)). Let G = (E, R) be a TTS and let ha be the maximum height of a tree in lhs(R). Given trees T and S, S a subtree of T, we write local(T, S) to denote the set of all nodes of S and the first ha proper ancestors of the root of S' in T (when these nodes are defined).</Paragraph> <Paragraph position="1"> Lemma 2 Assume that lhs(r), r E R, matches a tree T at some node n. Let T ~'~ T' and lel S be the copy of rhs(r) used in the rewriting. For every node n' no~ included in local(T', S), we have ~a(T, n') = Oa(T',n'). \[\] We precede the specification of the method with an informal presentation. The following three data structures are used. An associative list state associates each node n of the rewritten input tree with the state reached by Aa upon reading n. If n is no longer a node of the rewritten input tree, state associates n with the emptyset. A set rule(i) is associated with each rule ri, containing some of the nodes of the rewritten input tree at which lhs(ri) matches. A heap data structure H is also used to order the indices of the non-empty sets rule(i) according to the priority of the associated rules in the rule sequence. All the above data structures are updated by a procedure called update.</Paragraph> <Paragraph position="2"> To compute the translation M(G) we first visit the input tree with AG and initialize our data structures in the following way. For each node n, state is assigned a state of AG as specified above. If rule ri must be applied first at n, n is added to rule(i) and H is updated. We then enter a main loop and retrieve elements from the heap. When i is retrieved, rule ri is considered for application at each node n in rule(i). It is important to observe that, since some rewriting of the input tree might have occurred in between the time n has been inserted in rule(i) and the time i is retrieved from H, it could be that the current rule ri can no longer be applied at n.</Paragraph> <Paragraph position="3"> Information in state is used to detect these cases.</Paragraph> <Paragraph position="4"> Crucial to the efficiency of our algorithm, each time a rule is applied only a small portion of the current tree needs to be reread by AG, in order to update our data structures, as specified by Lemma 2 above.</Paragraph> <Paragraph position="5"> Finally, the main loop is exited when the heap is empty.</Paragraph> <Paragraph position="6"> Algorithml Let G - (~,R) be a TTS, R = (rl,r2,...,r~).and letT E ~ be an input tree.</Paragraph> <Paragraph position="7"> Let Aa = (2 ~ U {q0}, ~, ~a, q0, F) be the DTA associated with G and ~G the reached state function.</Paragraph> <Paragraph position="8"> Let also i be an integer valued variable, state be an associative array, rule(i) be an initially empty set, for 1 < i < ~', and let H be a heap data structure.</Paragraph> <Paragraph position="9"> (n ---+ rule(i) adds n to rule(i); i ---* H inserts i in H; i ~-- H assigns to i the least element in H, ifH is not empty.) The algorithm is specified in Figure 4. \[\] Example 4 (continued) We describe a run of Algorithm 1 working with the sample TTS G = (E, R) previously specified (see Figure 2).</Paragraph> <Paragraph position="10"> proc update( oldset, newset, j) for each node n E oldset</Paragraph> <Paragraph position="12"> update(O, nodes of C, i) while H not empty do i~-H for each node n E rule(i) s.t. the root of lhs(ri) Let Ci E ~T, 1 < i < 3, be as depicted in Figure 5. We write mij to denote the j-th node in a post-order enumeration of the nodes of Ci, 1 < i < 3 and 1 < j < 7. Assume that CI is the input tree. After the first call to procedure update, we have</Paragraph> <Paragraph position="14"> {nzh}; no other final state is associated with a node of C1. We also have that rule(l)= {m16}, rule(2) = {m17}, rule(3) = 0 and H contains indices 1 and 2.</Paragraph> <Paragraph position="15"> Index 1 is then retrieved from H and the only node in rule(l), i.e., mr6, is considered. Since the root of lhs(rz), i.e., node n15, belongs to q8, mz~ passes the test in the head of the for-statement in the main program. Then rz is applied to C1, yielding C2. Observe that mll = m21 and m17 -- m27; all the remaining nodes of C2 are fresh nodes.</Paragraph> <Paragraph position="16"> The next call to update, associated with the application of rl, updates the associative list state in such a way that state(m27) = q9 = {n35}, and no other final state is associated with a node of C2. Also, we now have rule(l) = {m16}, rule(2)= {m27} (recall that m17 = m27), rule(3) = {m27}, and H contains indices 2 and 3.</Paragraph> <Paragraph position="17"> Index 2 is next retrieved from H and node m27 is considered. However, at this point the root of lhs(r2), i.e., node n~5, does no longer belong to state(m27), indicating that r~ is no longer applicable to that node. The body of the for-statement in the Finally, index 3 is retrieved from H and node m27 is again considered, this time for the application of rule r3. Since the root of lhs(ra), i.e., node n35, belongs to state(m27), r3 is applied to C2 at node m27, yielding C3. Data structures are again updated by a call to procedure update with the second parameter equal to 4. Then state qs is associated with node m37, the root node of C3. Despite of the fact that qs E F, we now have next(qs, 4) = _k. Therefore rule rl is not considered for application to C3. Since H is now empty, the computation terminates returning C3. \[\] The results in Lemma 1 and Lemma 2 can be used to show that, in the main program, a node n passes the test in the head of the for-statement if and only if lhs(ri) matches C at n. The correctness of Algorithm 1 then follows from the definition of the heap data structure.</Paragraph> <Paragraph position="18"> We now turn to computational complexity issues.</Paragraph> <Paragraph position="19"> Let p = maxl<i<_~lril. For T e E T, let alsot(T) be the total number of rules that are successfully applied on a run of Algorithm i on input T, counting repetitions.</Paragraph> <Paragraph position="20"> Theorem 1 The running time of Algorithm 1 on input tree T is 0(I TI + pt(T) log(t(T))).</Paragraph> <Paragraph position="21"> Proof. We can implement our data structures in such a way that each of the primitive access operations that are executed by the algorithm takes a constant amount of time.</Paragraph> <Paragraph position="22"> Consider each instance of the membership of a node n in a set rule(i) and represent it as a pair (n, i). We call active each pair (n, i) such that lhs(ri) matches C at n at the time i is retrieved from H. As already mentioned, these pairs pass the test in the head of the for-loop in the main program. The number of active pairs is therefore t(T). All remaining pairs are called dead. Note that an active pair (n, i) can turn at most Ilhs(ri)I+hR active pairs into dead ones, through a call to the procedure update. Hence the total number of dead pairs must be O(pt(T)).</Paragraph> <Paragraph position="23"> We conclude that the number of pairs totally instantiated by the algorithm is O(pt(T)).</Paragraph> <Paragraph position="24"> It is easy to see that the number of pairs totMly instantiated by the algorithm is also a bound on the number of indices inserted in or retrieved from the heap. Then the time spent by the algorithm with the heap is O(pt(T) log(t(T))) (see for instance (Cormen, Leiserson, and Rivest, 1990)). The first cMl to the procedure update in the main program takes time proportional to \]T\[. All remaining operations of the algorithm will now be charged to some active pair.</Paragraph> <Paragraph position="25"> For each active pair, the body of the for-loop in the mMn program and the body of the update procedure are executed, taking an amount of time O(p). For each dead pair, only the test in the head of the for-loop is executed, taking a constant amount of time. This time is charged to the active node that turned the pair under consideration into a dead one. In this way each active node is charged an extra amount of time O(p).</Paragraph> <Paragraph position="26"> Every operation executed by the algorithm has been considered in the above analysis. We can then conclude that the running time of Algorithm 1 is O(ITI + pt(T) log(t(T))). 0 Let us compare the above result with the time performance of the standard algorithm for transformation-based parsing. The standard algorithm checks each rule in R for application to an initial parse tree T, trying to match the left-hand side of the current rule at each node of T. Using the notation of Theorem 1, the running time is then O(IrplTI). In practical applications, t(T) and ITI are very close (of the order of the length of the input string). Therefore we have achieved a time improvement of a factor of ~r/log(t(T)). We emphasize that ~r might be several hundreds large if the learned transformations are lexicalized. Therefore we have improved the asymptotic time complexity of transformation-based parsing of a factor between two to three orders of magnitude.</Paragraph> </Section> <Section position="2" start_page="259" end_page="260" type="sub_section"> <SectionTitle> 4.2 Order-dependent parsing </SectionTitle> <Paragraph position="0"> We consider here the general case for the TTS translation problem, in which the order of application of several instances of rule r to a tree can affect the final result of the rewriting. In this case rule r is called critical. According to the definition of translation induced by a TTS, a critical rule should always be applied in post-order w.r.t, the nodes of the tree to be rewritten. The solution we propose here for critical rules is based on a preprocessing of the rule sequence of the system.</Paragraph> <Paragraph position="1"> We informally describe the technique presented below. Assume that a critical rule r is to be applied at several matching nodes of a tree C. We partition the matching nodes into two sets. The first set contains all the nodes n at which the matching of lhs(r) overlaps with a second matching at a node n' dominated by n. All the remaining matching nodes are inserted in the second set. Then rule r is applied to the nodes of the second set. After that, the nodes in the first set are in turn partitioned according to the above criterion, and the process is iterated until all the matching nodes have been considered for application of r. This is more precisely stated in what follows.</Paragraph> <Paragraph position="2"> p of Q is indicated by underlying its label.</Paragraph> <Paragraph position="3"> We start with some additional notation. Let r = (Q ~ Q') be a tree-rewriting rule. Also, let p be a node of Q and let S be the suffix of Q at p. We say that p is periodic if (i) p is not the root of Q; and (ii) S matches Q at the root node. It is easy to see that the fact that lhs(r) has some periodic node is a necessary condition for r to be critical. Let the root of S be the i-th child of a node n/ in Q, and let Qc be acopyofQ. We write Qp to denote the tree obtained starting from Q by excising S and by letting the root of Qc be the new i-th child of hi.</Paragraph> <Paragraph position="4"> Finally, call nl the root of Qp and n2 the root of Q. Example 5 Figure 6 depicts trees Q and Qp. The periodic node p of Q under consideration is indicated by underlying its label. \[\] Let us assume that rule r is critical and that p is the only periodic node in Q. We add Qp to set lhs(R) and construct AG accordingly. Algorithm 1 should then be modified as follows. We call p-chain any sequence of one or more subtrees of C, all matched by Q, that partially overlap in C. Let n be a node of C and let q = state(n). Assume that n2 E q and call S the subtree of C at n matched by Q (S exists by Lemma 1). We distinguish two possible cases.</Paragraph> <Paragraph position="5"> Case 1: If nl E q, then we know that Q also matches some portion of C that overlaps with S (at the node matched by the periodic node p of Q). In this case S belongs to a p-chain consisting of at least two sub-trees and S is not the bottom-most subtree in the p-chain.</Paragraph> <Paragraph position="6"> Case 2: If nt ~ q, then we know that S is the bottom-most subtree in a p-chain.</Paragraph> <Paragraph position="7"> Let i be the index of rule r under consideration.</Paragraph> <Paragraph position="8"> We use an additional set chain(i). Each node n of C such that n~ 6 state(n) is then inserted in chain(i) if state(n) satisfies Case 1 above, and is inserted in rule(i) otherwise. Note that chain(i) is non-empty only in case rule(i) is such. Whenever i is retrieved from H, we process each node n in rule(i), as usual. But when we update our data structures with the procedure update, we also look for matchings of lhs(ri) at nodes of C in chain(i). The overall effect of this is that each p-chain is considered in a bottom-up fashion in the application of r. This is compatible with the post-order application requirement. null The above technique can be applied for each periodic node in a critical rule, and for each critical rule of G. This only affects the size of AG, not the time requirements of Algorithm 1. In fact, the proposed preprocessing can at worst double ha.</Paragraph> </Section> </Section> class="xml-element"></Paper>