File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-1022_metho.xml
Size: 7,383 bytes
Last Modified: 2025-10-06 14:12:57
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1022"> <Title>Chart Parsing of Robust Grmnmars *</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Basic algorithm </SectionTitle> <Paragraph position="0"> As a parsing algorithm to start from, Earley's (1971) chart parser has been chosen, which h~-s a top-down component adaptable to the top-down percolation ofirtdex infornmtion, and which guarantees a worst case complexity of O(n ~) even for mnaximal ambiguity. We use the declarative Earley variant in D/irre (1987). For a cfg G = < Cat, Lex, P, ,qset >, where Cat is a set of non-terminals, Lez a set of terminals, P a set of rules and ,qset a set of start symbols, it is charact,;ri~ed by the fonowing predictor concept: * the predictor is a relation D(i,A) C n + x C, al between a vertex i < n and a rtort-termirtal .,4. It is integrated into the completer and scanner components (see below), Tlfis has the advantage that no cyclic items i.e. items with an empty string of parsed symbols, have to be asserted to the chart.</Paragraph> <Paragraph position="1"> * initialization is the special predictor case D(0, S) where 6' is a start symbol.</Paragraph> <Paragraph position="2"> Let V = Cat U Le:e, A --* ,~fl E P and 0 < i < j '< n. Chart\[i,j\] be the set of arcs between vertices i and j and ~ be the transitive cover of the derivation relation. Then every item in the chart may be characterized by the following membership condition 6 which respects both top-down (TD) and bottom-up (BU) information. Remark that for the (basis variant of the) Earley algorithm, while item nrembership depends on top-down predictor information, the acceptance of inpnt strings is independent of the predictor (Kilbury 1985).</Paragraph> <Paragraph position="4"/> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The RPSG variant </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Item Concept </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"> where item number, the -possibly indexed- left hand symbol, the list of parsed symbols and the list of symbols yet to parse are well-known item parts. The variables Lind and Rind represent tile status of snbstring generation to tlle left and to the right of the Parsed string, respectively. Lind # Rind is possible even for the SUB index, since items represent prefix information on a constituent, whereas a PAR index always effects Lind -- Rind. Partial string information from higher nodes, which is justified only within the appropriate derivation, nmst be distinguished from SUB or PAR indexing of art item's LHS symbol, which rtlways licences arbitrary substrings. To allow reconstructiort of a derivation, RefList records the pairs of items (or pairs of rule and item, see below) an item is completed from, or it equals lex for lexical items 'r. To state the chart membership condillon of the RPSG variant, we g,~,eralize the hnction gen to nat argnment pair of strings of terminals and possibly indexed rton-termirtals:</Paragraph> <Paragraph position="4"> The RPSG membership condition, then, is: A~---~c~.fi C Chart\[i,j\] iff lion, tee e,g. Doerre (198&quot;/) for a discussion</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 The Predictor </SectionTitle> <Paragraph position="0"> The predictor of the RPSG variant s is, again, a relation over vertices and nou-ternfinals. \]ha contrast to the basis variant, however, a null predictor would be incorrect for the RPSG variant, since the acceptance of a string now depends on the substring information percolated by lhc predictor. The. first predictor clause allows an &quot;initialisation&quot; for every vertex. The second clause formulates the expectation of a non-terminal A, I by an active item i.e. an item with a nonempty llst To-Parse, and the tltird the expectation by passive items with a SET index. Clause 4 expects a start synd)ol on the basis of left adjunction to a SET indexed symbol. The following proposition, a proof of wbid~ is available from the anthor, states the correctness of this predictor formalization.</Paragraph> <Paragraph position="1"> .C/en * ( S, ,o &quot;'~ A,~g ) = 1 iff D ( i, A,, ) for a S E Sseti,,,l</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4.~ The Completer </SectionTitle> <Paragraph position="0"> The completer component integrates the predictor relation and the substring generation function and has two rules for rightside and ~see Appendix A for a complete formal characterit~ation of the RPSG chart parser leftside mljunction under a set-indexed symbol. Given that the conditions in the if-clause (and the lookahead condition, see below) yield, tlte completer adds new items to the chart 9 Clansc I of the RPSG completer, is, up to the generation function instead of derivation, equivalent to the completer of the basis varit~nt: Given a rightslde passive item, it adds a new item both for a matching active item and for the prediction of an appropriate rules's LtlS symbol. Tltus, no cyclic items have to be created. Furthermore, since RPSGs do not have productions, there is no need to handle cyclic items at all. Clause 2 does riglitsld- ndjnnclion of a start symbol item to a passive SET indexed item. \]ht left a~unction according to clause 3, the adjoined (passive) item can again be licensed both by another (active or passive) SET indexed item or by the predictor relation.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Scanner and Lookahead </SectionTitle> <Paragraph position="0"> ~illCe tile scanller conlponellt lIIS~v ~-)e been as n lexical case of the completer, )h~ RPSG algorithm could be reduced to a single active completer component and the controlling relation D (Kilbury 1985). Remark thai the scannet allows for IIPSG rules with RtlS strings of terminals and non-terminMs. A partial lookshead of 1, being applied to active items only, has proven advantageous in the basic variant (DSrre 1987). lu the RPSG variant, the length of the lookahead must be conditioned to the fact that zero or more non-derived but generated words may follow a given vertex. The lookahead fails if, for the first To-Parse sym-The relation F il~cludes the operation ~) which procedura)ly asserts new items 2o the chrttt AcrEs DE COLING-92, NANTES, 23-28 ASSET 1992 1 2 3 PROC. Of COLING-92, NANTES. AUtL 23-28, 1992 bol, there is no first derivable lexical item, that is accessible given the actual substring information. null Unfortunately, the scanner is not independent from this lookahead, since, in many cases, the item licensed by a lookahead operation onto o lexical item i is exactly the item licensing i within the predictor relation. That is, from a procedural viewpoint of enterlng items into the chart, the lookahead condition and the predictor block each other for certain lcxical items. In this situation we decided to have a scanner without a predictor relation, thus paying for lookahead with an increased local lexical ambiguity. null</Paragraph> </Section> </Section> class="xml-element"></Paper>