File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/83/e83-1020_intro.xml
Size: 7,873 bytes
Last Modified: 2025-10-06 14:04:20
<?xml version="1.0" standalone="yes"?> <Paper uid="E83-1020"> <Title>A FLEXIBLE NATURAL LANGUAGE PARSER BASED ON A TWO-LEVEL REPRESENTATION OF SYNTAX</Title> <Section position="2" start_page="0" end_page="114" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> The problem of performing an accurate synta~ tic analysis of Natural Language sentences is still challenging for A.I. people working in the field of N.L. interpretation (Charniak 81, Kaplan 82). The most relevant points which attracted at tention recently are: the need of a strong connection between synta~ tic processing and semantic interpretation in order to reduce the space of the alternative sy~ tactic analyses (Konolige 80, Sidner et al. 81, Milne 82) - the convenience of a quasi-deterministic synta~ tic analysis, in order to reduce the computation al overhead associated with a heavy use of back up (Marcus 80) - the convenience of an approach which tolerates also (partially) incorrect sentences, at least when it is possible to obtain a meaningful inter pretation (Weischedel & Black 80, Kwasny & Sond heimer 81, Hayes 81).</Paragraph> <Paragraph position="1"> The first two of these remarks guided the design and the implementation of a system devoted to the interpretation of N.L. (Italian) commands (Lesmo, Magnani & Torasso 81a and 81b). In that system, however, as in most N.L. interpreters, the anal~ sis of the input sentence is mainly syntax-driven; for this reason, justin case the input sentence respects the constraints imposed by the syntactic knowledge it can be interpreted.</Paragraph> <Paragraph position="2"> The problem of analyzing ill-formed sentences has received a great deal of attention recently. However, most studies (Weischedel & Black 80, Kwasny & Sondheimer 81) are based on standard syn_ tactic analyzers (A.T.N.) which have been further ly augmented in order to take into account sen fences lacking some required constituents (elli~ sis) or where some syntactic constraints are not respected (e.g. agreement in number between the subject and the verb).</Paragraph> <Paragraph position="3"> There are two problems with this approach; both of them depend on the choice of having a sy~ tax based analysis. The first problem is the ne cessity of extending the grammar; of course, it is necessary, in general, to specify what is grarmuat~ cal'and what is not, but it would be useful that this specification does not interfere too heavily in the interpretation of the sentence. In fact, if all deviations would have to be accounted for in the grammar, an unforeseen structure would block the analysis, even if the sentence can be consider ed as understandable. Consider, for instance, the following sentence: Mary drove the car and John the truck (SI) The absence of the verb in the second clause can be considered an acceptable form of ellipsis and, consequently, the sentence can be interpreted cor rectly. On the othe: hand, it is very unlikely that an extension of the grammar would cover the following ungrammatical (see Winograd 83, pag.480) sentence: * The book that for John to read would be difficult is beautiful ($2) However, even if some efforts are required, this sentence can be considered as understandable. As stated above, a comprehensive system must be able to detect the ungrammaticality of $2, but this de tection should not prevent the construction of a structure to pass to the semantic analyzer. More over, it seems that a subtle grammaticality test of this kind is easier to make (and to express) on a structured representation of the sentence (e.g. a tree) than on the input sentence as such.</Paragraph> <Paragraph position="4"> The second problem which must be faced when an ATN . ~s extended to handle ill-formed sen tences is the one of word ordering. ATNs are po E erful formal tools able to analyze type-O lan guages; in the theory of formal languages alan guage is defined as a set of strings; for this reason ATNs must recognize Uordered sequences&quot; of symbols (or words). Of course also the natural lan guages have fixed rules which define the admissi ble orderings of words and constituents, but, if those constraints have to be relaxed to accept ill-formed inputs, some extension%which are less straightforward than the ones used for handling the absence of a constituent are needed. For exam pie, the sentence Ate the apple John ($3) is ungrammatical, easily understandable, but seems to require in an ATN the extension of the S net~to allow to traverse the constituents in a different (even if syntactically wrong) order. Also in this case it seems that the construction of a struetur ed representation of the sentence could be the first step of the analysis; when it is done, the ordering constraints can easily be verified and, in case they are not respected either an alterna rive analysis is tried*or, as in the case of $3~ the sentence is passed to the Semantic analyzer and, possibly, the parser signals the presence of a syntactic error.</Paragraph> <Paragraph position="5"> In this paper we present a parser which al lows to make axplicit the interconnections between syntax and semantics , to analyze the sentences in a quasi-deterministic fashion and, in many cases, to identify the roles of the various constituents even if the sentence is ill-formed.</Paragraph> <Paragraph position="6"> The main feature of the approach on which the parser is based consists in the two-level represe~ tation of the syntactic knowledge: a first set of rules emits hypotheses about the constituents of the sentences and their functional role and an m other set of rules verifies whether a hypothesis satisfies the constraints about the well-formed hess of sentences. However, the application of the second set of rules is delayed until the semantic knowledge confirms the acceptability of the hyp~ thesis. If the semantics reject the current hyp~ thesis, an alternative one is tested: this control structure guarantees that all hypotheses which sa tisfy the weak syntactic constraints (which govern the emission of hypotheses) and the semantic con straints are tried before considering the input sentence as uninterpretable.</Paragraph> <Paragraph position="7"> The claim that the parser operates in a quasi-deterministic fashion is justified by the kind of processing that the system performs when a hyp~ thesis is rejected: in most cases a new hypothesis is obtained by applying a simple and relatively un expensive &quot;natural&quot; modification; a set of these modifications is predefined and only when none of them is applicable a real backup is performed: in most cases this situation corresponds to a case where people would normally garden path.</Paragraph> <Paragraph position="8"> The decision of paying particular attention to the problem of analyzing ill-formed sentences is motivated by the intended application of the parser. In fact it is included in a larger system, which allows the user to interact in natural lan guage with a relational data base (Siklossy, Lesmo & Torasso 83, Lesmo, Siklossy & Torasso 83).</Paragraph> <Paragraph position="9"> Various systems have been developed in the last years, which act as N.L. interfaces to data bases (Harris 77, Waltz 78, Konolige 80) and all of them pointed out the necessity of having at disposal mechanisms for handling ill-formed inputs (mainly ellipsis).</Paragraph> <Paragraph position="10"> In the following some example sentences will be discussed; they refer both to the implemented system and to more general sentences. This is ju~ tified, because the linguistic coverage of the perser is wider than the one required by a data base interface, even if the data base, the seman tic knowledge and the lexicon are restricted to&quot; a particular domain.</Paragraph> </Section> class="xml-element"></Paper>