File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/c92-1022_abstr.xml

Size: 6,464 bytes

Last Modified: 2025-10-06 13:47:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1022">
  <Title>Chart Parsing of Robust Grmnmars *</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Robustness is a formal behaviour of natural langatage grammars to assign a best partial description to linguistic events wltose strong description is inconsistent or cannot be constructed. Events of this sort may be called defective with respect to a grammar fragment.</Paragraph>
    <Paragraph position="1"> Defectiveness arises from the performance use that hnman beings make of language. Since defectiveness can be seen as failure of linguistic description, the principal way to robustness is a method to weaken these descriptions.</Paragraph>
    <Paragraph position="2"> Robust parsing, then, is parsing of robust granmmrs: a parser is robust iff it has the capabillty to interpret weak grammar fraKments correctly. In this paper, I shall try to substantiate this claim by motivating a grammar dependent approach to robust parsing and then describing a chart parsing nlgoritbra for ro~ bust g ......... rs. Though only c(ontext) f(ree) grammars will be adressed, there is an obvious extension of the algorithm to annotated (unification-) grammars (WACSG formalism, see Goeser 1900) along the lines of (Shieber 198~). Grammar based robustness tools have been explored in a variety of formalisms, e.g. the metarule device within the ATN formalism (Weischedel and Sondheimer 1898), entity data structures in a case frame approach (Hayes 1984) or the weak description approach in unification based grammars (Kudo et al. 1988, Goeser 1990). Parsing cf grammars with rodegThe work reported has been done while the author received an LGF grnnt at the University of Stuttgart. bustness features competes with algorithnfic approaches to robustness where parsing algorithms, (usually chart parsers except in Tomabechi and Tomita (1988) where LR(k) parsing is advocated) are extended to inelude robustness features (Mellish 1989, Long 1988) and/or heuristics to handle defect cases (Banger 1990, Stock et al. 1988).</Paragraph>
    <Paragraph position="3"> Maybe the most critical issue in robust parsing is ambigatity, which emerges when constituency is loosened to some cf substring analysis. E.g.</Paragraph>
    <Paragraph position="4"> Mellish (1989) p ..... for a cfg G the (cf) set PAR(G) which is the set of all strings contain~ ing a sequence ofnonempty substrings which is in the cflangqtage L(G) I In the worst case scenario where all these seqaences are in L(G), we get for a w E L(G) with an ambiguity k (in G) an exponential ambiguity of k x 2 I'1 as mx upper bound. Even in a non-worst cast, which should be the case of realistic cfgs, local ambiguities from substring analysis massively increase parsing time. E.g. in the (non-defective) example 1, the arcs a, b, c are empirically valid while the arcs d,e are artefacts of m~ algorithm parsing PAR(G).</Paragraph>
    <Paragraph position="5">  Reflecting syntactic defectiveness in a cfg metros to n-~sigqt it a coxtfigtlrational regular-Sty. Obviously, there is syntactic defectivity which is syntactically nonregalar, such as corraq~ted output from a speech recognition device (Tomabechi and Tomita 1988) ~ or global constituent breaks (Goeser 1991), which can be subjected to syntactic prefix analysis only.</Paragraph>
    <Paragraph position="6"> On the other hand, there are spoken language constructions (Lindgren 1987, Goeser 1991, Langer 1990) and various kinds of &amp;quot;fragmentary utterances&amp;quot; (Cnrbonell and ltnyes 1983) that definitively show configurational properties. null Let us look at ~ frequent spoken language construction called restart, as in the Germml col pus exmnple (2) ~. ll.estarts follow a pattern &lt; c~/3 ,,4 /~3' &gt; where the strings c~ and 7 but not/5 and f~' may be empty. The restart marker A is optional: in 67 from 96 restart smnples/3, which mostly ends in a constitnent break, and /3' were separated phonologically by tone constancy, a short pause or without any marking at all 4. Restarts are a kind of constituent co-ordination not aUowing for ellipsis phenomena such as gapping, left deletion, split coordination or sluicing. The ~ substring is usually defective and may indeed contain arbitrary noise ~This mnt~riM wmy Jllow phonologlcM regulariliea, of courlc s All coxplls evidence reported here ia psychotherapeutlc discourle frott~ tire ULMER TEXTBANI( t Therefor% IJanger'l (19Ofl) rettart hemrktlcs teems empirically iltadequate inaafnr at it pomttdate$ a lyntactic restart marker.</Paragraph>
    <Paragraph position="7"> (see e.g, example (3)) ~ (2) da \[is es d ....... dt ein A there \[ is it then still a A kmnmt noch ein anderes Problem hinzu\] comes yet another problem to-that\]  are cfgs with a set of start symbols and with rules whose left hand side may be indexed with the keyword SET, SUB, or PAR. The SET index on a rule'! tits licenses the adjlmetion of any start symbol to the right or left of its RHS string. The SUB index licenses arbitrary terminal strings to the right or left of the indexed symbol's lexied projection. The PAR index includes SUB and additionMly licenses any terminal strings within this lexlcal projection. (Left and right sided indices SETL, SUBL and SETII, SUBR,respeetively, are also in use). In a derivation relation --~, for RPSGs an indexed symbol A, r unifies with category A to give A w Formally, SET adjnnetion participates in the cf derivation relation, while SUB and PAIl are interpreted by a recursive generation function gen operating on derivations: where to is a derivation, t its tree structure, Cat;~d the set of indexed or non-indexed nonternfnals and Lea: the set of terminals.The example deri*ation tree (4) shows ,SET adjunetion (dotted llne~) and areas where arbitrary tFor a more thorot~h dlacutllon of reitart *yntax, lee Goe0er (1991).</Paragraph>
    <Paragraph position="8"> ACRES DE COLING-92, NANa1.:S, 23-28 AOOi&amp;quot; 1992 1 2 1 PROC. OV COTING-92, NAh&amp;quot;rES, AUO. 23-28, 1992 sabstrings m'e licensed by an indexed node.</Paragraph>
    <Paragraph position="9"> Generally, local arbitrariness within a string may be rally modened with an RPSG. Though finite cfls are turned into infinite ones through RPSG indexing, the syntactic description with RPSG is still configurational up to certain local adjnrtctiorts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML