File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/88/c88-1075_concl.xml

Size: 10,893 bytes

Last Modified: 2025-10-06 13:56:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1075">
  <Title>Parsing Incomplete Sentences</Title>
  <Section position="6" start_page="366" end_page="371" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have shown that Earley's construction, when correctly accepting cyclic grammars, may be used to parse incomplete sen-.</Paragraph>
    <Paragraph position="1"> tences. The generality of the construction presented allows its adaptation to any of the classical parsing schemata \[16\], and the use of well established parser construction techniques to achieve efficiency. The formal setting we have chosen is to our knowledge the only one that has ever been used to provc the correctness of the constructed parse forest as well as that of the recognizer itself. ~C/Ve believe it to be a good framework to study SNote that in such a situation; a rule X -~ aX of the language grammar G behaves as if it were a cyclic rule X --* X, since the parsing proceeds as if it were ignoring terminal symbols. This does not lead to an infinite computation since ohly a finite number (proportional to i) of distinct items can be built in 8~.</Paragraph>
    <Paragraph position="2"> SWe assume, only for simplicity of exposition, that * is followed by a normal input word symbol. Note also that 8i+1 is not built.</Paragraph>
    <Paragraph position="3"> ldegIf the input were reduced to the unknown subsequence alone, the output grammar ~ would be equivalent to the original grammar 151 of the input language (up to simple transformation). The output parse sequences would then simplify into a single occurrence of the symbol * qualified by the initial nonterminal I~ of the \]augusta grammar G.</Paragraph>
    <Paragraph position="4">  the structure of parse forests \[4\], and to develop optimization strategies.</Paragraph>
    <Paragraph position="5"> Recent extensions of our approach to recursive queries in Datalog \[19\] and to Horn clauses \[20\] are an indication that these techniques may be applied effectively to more complex grammatical setting, including unification based grammars and logic based semantics processing. More generally, dynamic programming approaches such as the one presented here should be a privileged way of dealing with ill-formed input, since the variety of possible errors is the source of even more combinatorial problems than the natural ambiguity or non-determinism already present in many &amp;quot;correct&amp;quot; sentences.</Paragraph>
    <Paragraph position="6"> Acknowledgements: Sylvie Billot is currently studying the implementation technology for the algorithms described here \[3,4\]. The examples in appendices A &amp; B were produced with her prototype implementation. The author gratefully acknowledges her commitment to have this implementation running in time, as well as numerous discussions with her, V~ronique Donzeau-Gouge, and Anne-Marie Vercoustre.</Paragraph>
    <Paragraph position="7"> A 3:im~)ie example wi~,hout unknown input subsequence Tbi,'~ first simple exanrple, without unknown input, is intended to fiunilia~:ize the ' with our rem:u~r constructions.</Paragraph>
    <Paragraph position="8"> A.~I Craxnxnar of the analyzed language 'l'i~ia grmr.m~' is taken fl'om \[28\].</Paragraph>
    <Paragraph position="9"> Nonterndna\]s are in C~l)ital letters, and termimtls are in lower ea~u,.. 'PS1.,e lh'zt r~le i~ treed for initialization and lmn-dling of tim delinfitez' symbol $. The $ delimiters are implicit in ~:b,~., r~e~aal input sentencC/~.</Paragraph>
    <Paragraph position="10">  (4) itP : :~ de~ n (5) ~P : :~ t~P PP (6) '?P ::~, pr(~p hip (7) VP ::,~ v ~P  This inpn:; eo~'re~pondu (for example) to the sentence: *~:i: ea.,\] a ~lan wi~h a mirror&amp;quot; :~ALY~:t:S \[IF: (~ v do'~ ~ prep dot zt) .,&amp;oii {71*~t~;'~x~; gr~a~.~iar in:educed by the parser The gr~J~o~,~,~r output bg the paxser is given in figure 2. The initial nol~te~mhLM is ~he left-hand side of the fh'st rule. l~br re~l~l)i\]i~;:~ t, he nonternfi:mfl/items have bemn given computer g*'xte~n.t(~/names, (ff the fens at.x, where :c is an integer. At this point we. have forgotten ~he ixdermd structm'e of the items corre* spending ~o C/~heix' x'o\]e in the pa.~sing process. All other symbols are ternfi~M. Integer terminals correspond to rule numbers of the input language grammar (-~ (see. section A.1 above), and the othe,&amp;quot; tex'Jx,hm\]f~ are symboh~ of the parsed language, i.e. symbols in ~\]. Not, ~. the ~.mbig~ity fi)r nonterminM at;3.</Paragraph>
    <Paragraph position="12"> Ao4 Simplified output grammar This is a simplified form of the grammar in which some of the structm'e that makes it readable as a shared-forest has been lost (though it could be retrieved). However it preserves all sharing of common subparses. This is the justification for having so many rules, while only 2 parse sequences may be generated by that grarmnar.</Paragraph>
    <Paragraph position="14"> The 2 parses of the input, which are defined by this grammaI'~ are: $ n 3 v det n 4 7 1 prep det n 4 6 2 $ $ n 3 v det n 4 prep det n 4 6 5 7 1 $ Here again the 2 symbols $ must be read as delimiters. A.5 Parse forest built from that grammar To explain the construction of the shared forest, we first build in figure 3 a graph from the grammar of section A.3. Here the graph is acyclic, but with an incomplete input, it could have cycles. Each node corresponds to one terminal or nonterminal of the grammar in section A.3, and is labeled by it. The labels at the right of small dashes are input grammar rule nmnbers (eft section A.1). Note the ambiguity of node nt3 represented by an ellipse joining the two possible parses.</Paragraph>
    <Paragraph position="15"> From the graph of figure 3, we can trivially derive tim shared-forest given in figure 4.</Paragraph>
    <Paragraph position="16"> For readability, we present this shared-forest in a simplified forra. Actually the sons of a node need sometimes to be represented as a binary Lisp like list, so as to allow proper sharing of some of the sons. Each node includes a label which is a non-terminal of the grammar Q, and for each possible derivation (several in case of ambiguity, e.g. the top node of figure 4) there is the number of the grammar rule used for that derivation.</Paragraph>
    <Paragraph position="17"> The constructions in this section are purely virtual, and are not actually necessary in an implementation. The data-structure representing the grammar of section A.3 may be directly interpreted and used as a shared-forest.</Paragraph>
    <Paragraph position="18"> B Example with an unknown input subsequence B.1 Grammar of the analyzed language The grammar is the same as in appendix A.</Paragraph>
    <Paragraph position="19"> 1-3o2 Input sentence This input corresponds (for example) to the sentence: ~... SaW , . . mirror ~ where the first &amp;quot;...&amp;quot; are known to be one word, and the last &amp;quot;.. o&amp;quot; may be any number of words, i.e.: ANALYSIS OF: (? v * n) B.3 Output grammar produced bythe parser Note that the nodes that derive on (several) symbol(s) * have been replaced by * for simplification as indicated at the end of  A parse of the input, chosen in the infinite set of possible parses defined by this grammar, is the following (see figure 6): $ ? 8 v* 7 1. 2 ** ** a46 5 62 $ This itt not ~'eally a complete parse since, due to the first simplification of the grammar, some * symbols stand for a missing nontermil~d, i.e. for any parse of a string derived from this nontermil~d. For example the first * stand for the nontermlnal Np and cmdd be replaced by &amp;quot;* 3&amp;quot; or by &amp;quot;* * 4 * * 3 6 5&amp;quot;. B,5 Parse shared-forest built from that gramI~laF null The outpu~ grammars given above are not optimal with respect to sharing. Mainly the nonterminals nt27 and st36 should be the same (they do generate the same parse fragments). Also the .terminal n should appear only once. We give in figure 5 a stmred-ibrest corresponding to this grammar, build as in the previo~ example of appendix A, were we have improved the shax'ing by merging at27 mxd st36 so as to improve readability. We do not give the intermediate graph representing tha output grannnar us we did in appendix A.</Paragraph>
    <Paragraph position="20"> Our implementation is currently being improved to directly achieve better sharing.</Paragraph>
    <Paragraph position="21"> In figure 6 we give one parse-tree extracted from the shared-forest of fig~rc 5. it corresponds to the parse sequence given as example in scction B.4 above. Note that, like the corresponding parse sequence, this is not a complete parse tree, since it Ires nontermir~\]s labeling its leaves. A complete parse tree may be obtained by completing arbitrarily these leaves according to the original grv.mmar of the language as defined in section A.1. C The algorithm The length of this algorithm is due to its generality. Fewer types of transitions axe usually needed with specific implementations, typically only one for scanning transitions.</Paragraph>
    <Paragraph position="22"> Coxmneats are prefixed with &amp;quot;--&amp;quot;.</Paragraph>
    <Paragraph position="23"> .... Begin parse with input sequence x of length n  --- input-scanner index is set -- before the first input symbol loop -- while i &lt; n (el, exit in step-B.$) if xi+t # * ~tepoB.l: -- Normal completion of item-set St --- with non-scanning transitions.</Paragraph>
    <Paragraph position="24"> :l:or nve.vy item U = ((pAi)(ql~j)) in 8/ do ~:or avery noa-scanuing transltion r in $ do we distinguish five cases, according to r:  -- Other non-scanning transitions are ignored else --~.</Paragraph>
    <Paragraph position="25"> --- t.e. the next input symbol - is the unknown subsequenee: step-B*.h -- Completion of item-set Si -- with non-scanning transitions -- and with dummy scanning transitions.</Paragraph>
    <Paragraph position="26"> --- This step is similar to step-B. 1, -- but considering all transitions as non-scanning.</Paragraph>
    <Paragraph position="27">  for every item U=((pAi)(qBj)) in Si do for every transition v in 6 do -- we distinguish five eases, according to r: case-B*.1.~:</Paragraph>
    <Paragraph position="29"> -- and so on as in step.B.l step-B.2: -- Exit for main loop if i = n then exit loop; ~- go to step-C</Paragraph>
    <Paragraph position="31"> while Xh=* do h := h+l; step-B.3: -- Initialization of item-set Sh &amp;:=C/; for every item u = ((p A i) (q B j)) in e do for every scanning transition r in ~ do -- Proceed by eases as in step.B.1, -- but with scanning transitions, and -- adding the new items to Sh instead of St. --- See for example the following case: fase-B.$.2: if r=(pea ~-~ rcz) with xh =a or xh=? then V := ((r C h) (p A i)) ;</Paragraph>
    <Paragraph position="33"> step-C: -- Termination for every item U =: ((f t n) (q $ O)) in an such that f 6 F do</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML