File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1028_metho.xml
Size: 10,376 bytes
Last Modified: 2025-10-06 14:14:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1028"> <Title>Cross-Serial Dependencies Are Not Hard to Process</Title> <Section position="4" start_page="158" end_page="159" type="metho"> <SectionTitle> 3 Cross-Serial Dependencies Are </SectionTitle> <Paragraph position="0"> Not Hard to Process It is always possible to compile less restrictive grammar formalisms into more restrictive covering formalisms, allowing different constituent analyses and potential stringset overgeneration. Meta-grammatical techniques give an alternative that preserve coverage, but use special purpose processing. We suggest a parsing method for languages that rely on ww which does not cost a greater complexity fec than the worst case for parsing context fi'ee grammars. The method is metagrammatical and therefore akin to proposals put forward previously for handling coordination (Dahl and McCord, 1983) with logic grammars and TAGs (Shieber, 1995) or for extraposition (Milward, 1994). The method is constrained enough not to augment overall processing complexity, implying that ww does not require the worst case recognition complexity for its characteristic class, the MCS languages.</Paragraph> <Section position="1" start_page="158" end_page="159" type="sub_section"> <SectionTitle> 3.1 Why not? </SectionTitle> <Paragraph position="0"> Trivially, the string duplication languages can be recognized with time complexity proportional to the length of the string -- if the string is of even length, and its first half is identical to the second half, then this can be established in just linear time. Though trivial in the sense of being about mere recognition, this is nonetheless interesting. In particular, under the reasonable hypothesis that humans are not in general reversewired a it is easier to process serial orders thml their reverse. In this trivial recognition model we could take tile serial ordering as primitive, but to use the same model as a recognizer for the context free string reversal languages would require an additional step of reversing the second tlalf of the string before checking equivalence, which means the recognition complexity is nlogn. Thus, for trivial recognition tim string duplication languages are easier to process than the string reversal lazlguagcs. This is a concrete illustration that not every language costs the worst case recognition complexity for its expressivity class.</Paragraph> <Paragraph position="1"> However, in the case of natural languages, parsing is of greater interest than mere recognition.</Paragraph> <Paragraph position="2"> A generalization of the recognizer method can be used inside a parsing approach as well. Suppose some i such that i > 2; suppose we want a recognizer for {ww\]w E {a,b}*} where w E PSi, then we can use a parser that is no worse than cubic (if i : 2) and which can be linear (if i = 3) to determine if w EEl. Thus, if we parse exactly half of the string using a processor designed for languages in PSi, and then ascertain whether the remaining half is identical, then we remain in the aWhile there actually is structural reverse wiring, psychological effects, like child learning of the distinction between left and right hands on themselves and on a person facing them, suggest that there is a difference in processing time required between recognizing a copy and an inverse copy. Another example comes from the recognition of rotated objects. There is a robust effect for which given a reference object and a rotated object-in-question it takes time linear in the amount of rotation to recognize the objects as copies. Mirror-image objects are isomorphic, yet it takes strictly more time to recognize reflected copies than to recognize nonreflected copies (Cooper, 1975).</Paragraph> <Paragraph position="3"> same processing complexity class, since the identity check occurs after tile parse and only requires linear time, but we also have structural information about the sentence as a whole. We know the structure of the first half of the string, and the second half of tile string but not the structure of tile second half (the grammar for w could be ambiguous), although we can assume that the second w was licensed by exactly the same tree structure as the first. This method also preserves a relative difference between parsing ww and ww n, at least for PS3. Since ww ~ can be represented directly within PS2 it can be argued that we should not be required to use the metagrammatical method of parsing it, just to keep symmetry with the duplication languages. Interestingly, if w is in PS2 and we use the metagrammatical parsing method, then ww ~C/ also requires more processing time than ww for the same reason as the trivial case. Suppose instead that we allow ww n to be parsed without using tile metagrammatical method. In that case ww is relatively even easier t.o process since it costs \[wl 3 to parse with the metagrammatical approach but ww I~ will cost (2\[wl) 3 in tile direct approach. It, might be claimed that just as we argue ww not to require the worst case complexity for its language class (PS1.5), neither need ww n for PS2; but, the reversal language is a canonical example of a language that makes maximal use of the stack in the PDA. In any case, the metagrammatical method for parsing ww costs no more than just parsing strings in the characteristic language class of w.</Paragraph> <Paragraph position="4"> If this were the complete story then we could only recognize languages homomorphic to the duplication languages. Clearly even the Ziirich dialect of Swiss-German allows other constructions, all of which we can assume are context free (Pullure and Gazdar, 1982). Essentially we want to be able to write arbitrary PS3 or PS2 grammars and also be able to parse the string duplication language for whichever PSi we choose. The language defined by such a union is no longer PSi, but will not contain arbitrary PS1.5 strings, and if i = 3 then the union will not even contain arbitrary context fi'ee strings. However, the situation is more involved than tile basic approach since there needs to be a way to indicate where the metagrammatteal approach is to be invoked. Add a single feature to the grammar interpreted by tile processor as 'expect a copy'. 4</Paragraph> </Section> </Section> <Section position="5" start_page="159" end_page="160" type="metho"> <SectionTitle> 1. A ---+ WBMY </SectionTitle> <Paragraph position="0"> We allow context free productions of the form shown in (1), where A and B are nonterminals and W, Y are (possibly empty) sequences of terminals and nonterminals, B possibly occurring among 4Ollce we admit 'interpretability by the processor' we in principle have TM power. Itowever we make quite restricted use of such interpretation. The rule format makes clear that it is less expressive than indexed grammars when interpreted directly.</Paragraph> <Paragraph position="1"> the nonterminals of Y. For an ambiguous CFG, there is no guarantee that multiple instances of a nontcrminal will rewrite to through the same sequence of productions to yield the same string.</Paragraph> <Paragraph position="2"> There are any number of ways that this basic notation can be used in a metagrammatical approach. In the first instance, we take c to be a signal to the processor to generate an expectation for a duplicate of the terminal sequence that the nonterminal it is attached to gets rewritten to, and that this expectation must be satisfied by the next nonterminal of the same name and in the same local domain. 5 This approach will require that the sequence of terminals rewritten from the first B in (1) will be duplicated by the terminal sequence rewritten from the first instance of B (if any) that occurs in Y. The restriction will not hold of subsequent instances of the nonterminal marked for copying in the same local domain nor at ditferent levels in the analysis. A stronger interpretation could require an expectation for the same constituent analysis of the nonterminal as well. Since we do not allow the feature to stack, tile string-based method does not yield the full expressive power of indexed languages. The point is just that it's possible to keep a CF (or regular) grammar, and supplement the processor with a string-duplication operator which can be; invoked at the subsentence level. This is sufficient to yield languages thai; more closely resemhle the Ziirich dialect in having other constructions besides the duplication construction, yet remaining efficiently processable. ~ We have implemented tile interpreter in a chart parser that can be used in either top-down or bottom-up fashion. Edges in the chart are marked with a category (some nonterminal or preterminal symbol from the grammar), constituents, subs|ring span and expectations (along with a unique identifier for each edge). This is modified to include a list of constraints, which for the present purposes is presumed to be just duplication checks. An edge with no expectations is inactive (saturated) and one with expectations is active. In the completer step, when active edges combine with adjacent inactive edges whose category satisfies the current expectation of the active, the usual process of creating a new edge with one less expectation is augmented with another: if the current expectation has an associated copy feature, then the new edge is marked with a constraint interpreted by the parser as indicated above -- the nonterminal symbol and tile string spanned by the inactive edge are noted so 5We take a local domain, in tree terms, as a node and tile set of nodes that it immediately dominates.</Paragraph> <Paragraph position="3"> ~To get closer still to the Zfirich dialect, we require that the duplication operator be applied at the level of preterminals, with complementation, to get the pairings of case-marked NPs and Vs.</Paragraph> <Paragraph position="4"> that the next inaetive edge of the same category (if one is expected) will have to span an |dent|eL1 string. Constraints of this form are not passed on after satisfied once, and are not passed out of the local domain. Within the same set of restrictions the implemented constraint could have been 'expect a reversed copy'. This would require con> putating the string's reverse before annotating the constraint list.</Paragraph> </Section> class="xml-element"></Paper>