File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0402_metho.xml
Size: 17,056 bytes
Last Modified: 2025-10-06 14:10:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0402"> <Title>Control Strategies for Parsing with Freer Word-Order Languages</Title> <Section position="5" start_page="10" end_page="10" type="metho"> <SectionTitle> CACTD5BUCE BP CUBDBMBMBMD2CV. 2.2.2 Active Edge Subsumption </SectionTitle> <Paragraph position="0"> The first step is to check the current state against states that have already been considered. For expository reasons, this will be presented below. Let us assume for now that this step always fails to produce a matching edge. We must then predict using the rules of the FWO grammar.</Paragraph> <Paragraph position="1"> As outlined in Penn and Haji-Abdolhosseini (2003), the predictive step from a state consisting of CWC6BNBVBNCACX using an immediate dominance rule,</Paragraph> <Paragraph position="3"> , with CZBQBD and no linear precedence constraints transits to a state CWC6</Paragraph> </Section> <Section position="6" start_page="10" end_page="10" type="metho"> <SectionTitle> BD BNBVBNAUCX pro- </SectionTitle> <Paragraph position="0"> vided that C6 is compatible with C6</Paragraph> </Section> <Section position="7" start_page="10" end_page="10" type="metho"> <SectionTitle> BC </SectionTitle> <Paragraph position="0"> . In the case of a classical set of atomic non-terminals, compatibility should be interpreted as equality. In the Actually, Penn and Haji-Abdolhosseini (2003) use CanBV and OptBV, which can be defined as BVCPD2BUCE CK CACTD5BUCE.</Paragraph> <Paragraph position="1"> case of Prolog terms, as in definite clause grammars, or typed feature structures, as in head-driven phrase structure grammar, compatibility can be interpreted as either unifiability or the asymmetric subsumption of C6 by C6</Paragraph> </Section> <Section position="8" start_page="10" end_page="11" type="metho"> <SectionTitle> BC </SectionTitle> <Paragraph position="0"> . Without loss of generality, we will assume unifiability here. This initial predictive step says that there are, in general, no restrictions on which word must be consumed (CACTD5BUCE BP AU). Depending on the language chosen for expressing linear precedence restrictions, this set may be non-empty, and in fact, the definition of state used here may need to be generalized to something more complicated than a single set to express the required consumption constraints.</Paragraph> <Paragraph position="1"> The completion step then involves recognizing the last RHS category (although this is no longer rightmost in terms of linear precedence). Here, the major difference from subsequent prediction is that there is now a potentially non-empty ReqBV. Only with the last RHS category are we actually in a position to enforce CA from the source state.</Paragraph> <Section position="1" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 2.3 Active Edge Subsumption Revisited </SectionTitle> <Paragraph position="0"> So far, this is very similar to the strategy outlined in Penn and Haji-Abdolhosseini (2003). If we were to add active edges in a manner similar to standard chart parsing, we would tabulate CX and then compare them in step 2.2.2 to current states CWC6BNBVBNCACX by determining whether (classically) C6 BP C6</Paragraph> <Paragraph position="2"> . This might catch some redundant search, but just as we can do better in the case of non-atomic categories by checking for subsump-</Paragraph> <Paragraph position="4"> AZ), we can do better on BV and CA as well because these are sets that come with a natural notion of containment. Figure 1 shows an example of how this containment can be used. Rather than comparing edges annotated with linear subspans, as in the case of CFG chart parsing, here we are comparing edges annotated with sublattices of the powerset lattice on D2 elements, each of which has a top element (its CanBV) and a bottom element (its ReqBV). Everything in between this top and bottom is a sub-set of words that has been (or will be) tried if that combination has been tabled as an active edge.</Paragraph> <Paragraph position="5"> Figure 1 assumes that D2 BPBI, and that we have tabled an active edge (dashed lines) with BV</Paragraph> <Paragraph position="7"> later that we decide to search for the same category in BV BP CUBDBNBEBNBFBNBGBNBHBNBICV, CA BP CUBDBNBECV (dotted lines). Here, BV BIBP BV</Paragraph> </Section> </Section> <Section position="9" start_page="11" end_page="11" type="metho"> <SectionTitle> CP </SectionTitle> <Paragraph position="0"> , so an equality-based comparison would fail, but a better strategy would be to reallocate the one extra bit in BV (3) to CA,and then search BV</Paragraph> </Section> <Section position="10" start_page="11" end_page="12" type="metho"> <SectionTitle> BC BP CUBDBNBEBNBFBNBGBNBHBNBICV, CA BC BP CUBDBNBEBNBFCV </SectionTitle> <Paragraph position="0"> (solid lines). As shown in Figure 1, this solid region fills in all and only the region left unsearched by the active edge.</Paragraph> <Paragraph position="1"> This is actually just one of five possible cases that can arise during the comparison. The complete algorithm is given in Figure 2. This algorithm works as a filter, which either blocks the current state from further exploration, allows it to be further explored, or breaks it into several other states that can be concurrently explored. Step 1(a) deals with category unifiability. If the current category, C6, is unifiable with the tabled active cat-</Paragraph> <Paragraph position="3"> , then 1(a) breaks C6 into more specific pieces that are either incompatible with C6 i. Let C7 BMBP C7 CKCI, BV BMBP BV CKCI, ii. continue [to next active edge]. (g) Fail -- this state is subsumed by an active edge.</Paragraph> <Paragraph position="4"> 2. else continue [to next active edge]. Only one of 1(g) or the bodies of 1(c), 1(d), 1(e) or 1(f) is ever executed in a single pass through the loop. These are the five cases that can arise during subset/bit vector comparison, and they must be tried in the order given. Viewing the current state's CanBV and ReqBV as a modification of the active edge's, the first four cases correspond to: the removal of required words (1(c)), the addition of required words (1(d)), the addition of optional (non-required) words (1(e)), and the reallocation of required words to optional words (1(f)). Unless one of these four cases has happened, the current sublattice has already been searched in its entirety (1(g)).</Paragraph> <Section position="1" start_page="11" end_page="12" type="sub_section"> <SectionTitle> 2.4 Linear Precedence Constraints </SectionTitle> <Paragraph position="0"> The elaboration above has assumed the absence of any linear precedence constraints. This is the</Paragraph> <Paragraph position="2"> worst case, from a complexity perspective. The propagation rules of section 2.2 can remain unchanged in a concurrent constraint-based framework in which other linear precedence constraints observe the resulting algebraic closure and fail when violated, but it is possible to integrate these into the propagators for efficiency. In either case, the active edge subsumption procedure remains unchanged.</Paragraph> <Paragraph position="3"> For lack of space, we do not consider the characterization of linear precedence constraints in terms of CanBV and ReqBV further here.</Paragraph> </Section> </Section> <Section position="11" start_page="12" end_page="14" type="metho"> <SectionTitle> 3 Category Graphs and Iteratively </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="12" end_page="13" type="sub_section"> <SectionTitle> Computed Yields </SectionTitle> <Paragraph position="0"> Whereas in the last section we trivialized linear precedence, the constraints of this section simply do not use them. Given a FWO grammar, BZ, with immediate dominance rules, CA, over a set of non-terminals, C6,wedefinethecategory graph of BZ to be the smallest directed bipartite graph, BVB4BZB5BPCWCEBNBXCX, such that:</Paragraph> <Paragraph position="2"> rule.</Paragraph> <Paragraph position="3"> We will call the vertices of BVB4BZB5 either category nodes or rule nodes. Lex and Empty are considered category nodes. The category graph of the grammar in Figure 3, for example, is shown in with circles, and rule nodes with boxes, and we label rule nodes by the LHS categories of the rules they correspond to plus an index. For brevity, we will assume a normal form for our grammars here, in which the RHS of every rule is either a string of non-terminals or a single terminal.</Paragraph> <Paragraph position="4"> Category graphs are a minor variation of the &quot;grammar graphs&quot; of Moencke and Wilhelm (1982), but we will use them for a very different purpose. For brevity, we will consider only atomic non-terminals in the remainder of this section. Category graphs can be constructed for partially ordered sets of non-terminals, but in this case, they can only be used to approximate the values of the functions that they exactly compute in the atomic case.</Paragraph> <Paragraph position="5"> Restricting search to unexplored sublattices helps us with recursion in a grammar in that it stops redundant search, but in some cases, recursion can be additionally bounded (above and below) not because it is redundant but because it cannot possibly yield a string as short or long as the current input string. Inputs are unbounded in size across parses, but within a single parse, the input is fixed to a constant size. Category graphs can be used to calculate bounds as a function of this size. We will refer below to the length of an input string below a particular non-terminal in a parse tree as the yield of that non-terminal instance. The height of a non-terminal instance in a parse tree is 1 if it is pre-terminal, and 1 plus the maximum height of any of its daughter non-terminals otherwise. Non-terminal categories can have a range of possible yields and heights.</Paragraph> </Section> <Section position="2" start_page="13" end_page="14" type="sub_section"> <SectionTitle> 3.1 Parse Tree Height </SectionTitle> <Paragraph position="0"> These functions compute yields as a function of height. We know the yield, however, and want bounds on height. Given a grammar in which the non-pre-terminal rules have a constant branching factor, we also know that CG B4CWB5, are monotonically non-decreasing in CW, where they are defined. This means that we can iteratively compute CG all values CW out to the first CW</Paragraph> </Section> </Section> <Section position="12" start_page="14" end_page="14" type="metho"> <SectionTitle> BCBC </SectionTitle> <Paragraph position="0"> that is equal to or greater than the current yield. The height of the resulting parse tree, CW, can then be bounded as</Paragraph> </Section> <Section position="13" start_page="14" end_page="14" type="metho"> <SectionTitle> CW BC A0 BD AK CW AK CW BCBC </SectionTitle> <Paragraph position="0"> . These iterative computations can be cached and reused across different inputs. In general, in the absence of a constant branching factor, we still have a finite maximum branching factor, from which an upper bound on any potential decrease in CG enough intervals, additionally define a finite domain constraint that excludes these.</Paragraph> <Paragraph position="1"> These recursive definitions are well-founded when there is at least one finite string derivable by every non-terminal in the grammar. The CG D1CXD2 functions converge in the presence of unit production cycles in BVB4BZB5;theCG D1CPDC functions can also converge in this case. Convergence restricts our ability to constrain search with yields. A proper empirical test of the efficacy of these constraints requires large-scale phrase structure grammars with weakened word-order constraints, which are very difficult to come by. On the other hand, our preliminary experiments with simple top-down parsing on the Penn Treebank II suggest that even in the case of classical context-free grammars, yield constraints can improve the efficiency of parsing. The latency of constraint enforcement has proven to be a real issue in this case (weaker bounds that are faster to enforce can produce better results), but the fact that yield constraints produce any benefit whatsoever with CFGs is very promising, since the search space is so much smaller than in the FWO case, and edge indexing is so much easier.</Paragraph> <Section position="1" start_page="14" end_page="14" type="sub_section"> <SectionTitle> 3.2 Cycle Variables </SectionTitle> <Paragraph position="0"> The heights of non-terminals from whose category nodes the cycles of BVB4BZB5 are not path-accessible can easily be bounded. Using the above heightdependent yield equations, the heights of the other non-terminals can also be bounded, because any input string fixes the yield to a finite value, and thus the height to a finite range (in the absence of converging CG D1CXD2 sequences). But we can do better. We can condition these bounds not only upon height but upon the individual rules used. We could even make them depend upon sequences of rules, or on vertical chains of non-terminals within trees. If BVB4BZB5 contains cycles, however, there are infinitely many such chains (although finitely many of any given length), but trips around cycles themselves can also be counted.</Paragraph> <Paragraph position="1"> Let us formally specify that a cycle refers to a unique path from some category node to itself, such that every node along the path except the last is unique. Note that because BVB4BZB5 is bipartite, paths alternate between category nodes and rule nodes.</Paragraph> <Paragraph position="2"> Now we can enumerate the distinct cycles of any category graph. In Figure 4, there are two, both passing through NP and S, with one passing through VP in addition. Note that cycles, even though they are unique, may share nodes as these two do. For each cycle, we will arbitrarily choose an index node for it, and call the unique edge along the cycle leading into that node its index link. It will be convenient to choose the distinguished non-terminal, CB, as the index node when it appears in a cycle, and in other cases, to choose a node with a minimal path-distance to CB in the category graph.</Paragraph> <Paragraph position="3"> For each cycle, we will also assign it a unique cycle variable (written D2, D1 etc.). The domain of this variable is the natural numbers and it counts the number of times in a parse that we traverse this cycle as we search top-down for a tree. When an index link is traversed, the corresponding cycle variable must be incremented.</Paragraph> <Paragraph position="4"> For each category node CG in BVB4BZB5, we can define the maximum and minimum yield as before, but now instead of height being the only independent parameter, we also make these functions depend on the cycle variables of all of the cycles that pass through CG.IfCG has no cycles passing through it, then its only parameter is still CW.We can also easily extend the definition of these functions to rule nodes.</Paragraph> <Paragraph position="5"> Rather than provide the general definitions here, we simply give some of the equations for Figure 4, We think of functions in which overscores are written over some parameters as entirely different functions that have witnessed partial traversals through the cycles corresponding to the overscored parameters, beginning at the respective index nodes of those cycles.</Paragraph> <Paragraph position="6"> Cycle variables are a local measure of non-terminal instances in that they do not depend on the absolute height of the tree -- only on a fixed range of nodes above and below them in the tree.</Paragraph> <Paragraph position="7"> These makes them more suitable for the iterative computation of yields that we are interested in. Because CG D1CPDC and CG D1CXD2 are now multi-variate functions in general, we must tabulate an entire table out to some bound in each dimension, from which we obtain an entire frontier of acceptable values for the height and each cycle variable. Again, these can be posed either as interval constraints or finite domain constraints.</Paragraph> <Paragraph position="8"> In the case of grammars over atomic categories, using a single cycle variable for every distinct cycle is generally not an option. The grammar induced from the local trees of the 35-sentence section wsj 0105 of the Penn Treebank II, for example, has 49 non-terminals and 258 rules, with 153,026 cycles. Grouping together cycles that differ only in their rule nodes, we are left with 204 groupings, and in fact, they pass through only 12 category nodes. Yet the category node with the largest number of incident cycles (NP) would still require 163 cycle (grouping) variables -- too many to iteratively compute these functions efficiently. Naturally, it would be possible to conflate more cycles to obtain cruder but more efficient bounds.</Paragraph> </Section> </Section> class="xml-element"></Paper>