File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/88/c88-1048_abstr.xml
Size: 9,886 bytes
Last Modified: 2025-10-06 13:46:35
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1048"> <Title>hnproving Search Strategies An Experiment in BesbFirst Parsing Hans ItAUGENEI)ER</Title> <Section position="1" start_page="0" end_page="238" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Viewing the syntactic analysis of natural language as a search problem, the right choice of parsing strategy plays an important role in the performance of natural language \[arsers. After a motivatim: of the use of various heuristic criteria, a fl'amework for defining and testing par:;\[ng strategies is presented. On this basis systematic tests on different parsing strategies have been performed, the results of which are dicussed.</Paragraph> <Paragraph position="1"> Generally ;hese tests show that a &quot;guided&quot; depthoriented strategy gives a considerable reduction of search effort eompared to the classical depth.first strategy.</Paragraph> <Paragraph position="2"> ~.. Introduction Parsing natural language utterances can be considered a search t,roblem, which is characterized by the application of a set of operators (i.e. the grammar rules) onto the input data (phrase to be processed) in order to yield a ~)nal state (derivation tree). In practical applications which are characterized by grammars with a large eow.,rage and a non-trlvial complexity of the input {mea,';ured e.g. in sentence length and lexieal ambiguity) one is confronted with difficulties that seem quite common to various search problems, namely the size of the search space and the selection among multiple solutions, Two (iuite opposite approaches to these problems have been proposed. In tile one approach, the brute force of exhaustive .,~earch has been used, possibly augmented with some ranking scheme tbr the set of parses. In the other approach, the parsing of natural language utterances is considered a deterministic process \[Mar80\], where a &quot;wait and set.'&quot; strategy makes the tlavour of searching through the alternative application of different grammar rules disappear, at least for grammars w{th limited coverage.</Paragraph> <Paragraph position="3"> The approach we are taking to this problem lies between these two ext('emes: ConC'el)tually, it takes the first view, considering natural language parsing a nondeterministic process; fi'om a performance point of view, it is :tirected towards the approximation of deterministic behaviour. Thus our aim is to develop a best-first parsing strategy which enables tile parser by means of heuristic criteria and information to limit the overall search space as much as possible to arrive at tile first parse at low costs achieve the most plausible analysis as the first one.</Paragraph> <Paragraph position="4"> With these aims in mind - at present mainly concentrating on the first one - we still want to maintain tile ability of our nleehanism to find t'm'~her solutions, since we do not assume the order of the analyses to be correct all the time. Thus &quot;hem'isties&quot; is understood as improving the problem solving perfornlance without affecting the competence \[Min63\].</Paragraph> <Paragraph position="5"> What we propose is a practically oriented approach to these problems; it is practical in the sense that our primary focus is not to model the human sentence processing meehanisn~ or specify the human parsing strategy. We are rather aiming towards the development of parsing strategies, that are based on heuristic information, enabling the parser to choose the right paths in the search space most of the time.</Paragraph> <Paragraph position="6"> Although psychological results on human sentence processing strategies may be incorporated in the heuristics to be developed - at least as far as they fit in our h'amework and do not assume special properties of the underlying processing sehenle we do not understand our work as contributing to the eharoeterization of inherent structures of the human sentence processor.</Paragraph> <Paragraph position="7"> Thus our goal is not of an &quot;all or nothing&quot; character; we do not expect our parser to make the right choice all the time. What we do want, however, is to develop a more pragmatic strategy, which, when applied to major samples of sentences, is able to give us the first reading with a minimal overall search effort.</Paragraph> <Paragraph position="8"> After testing some strategies that give the parser more guidance by increasing the information available at the choice points, some promising results have emerged.</Paragraph> <Paragraph position="9"> Work in a similar direction on the MCC Lingo project In a number of natural language parsers .- especially in those with practical orientation and grammars with comprehensive coverage - the problem of dealing with alternative parses has been handled by some sort of scoring measures for sets of alternative parses already produced by breadth-first enumeration.</Paragraph> <Paragraph position="10"> This is the case in the DIAGRAM parser, where arbitrary sub-procedures (so~called factors) assign likelihood scores to syntactic analyses \[Rob82\]. In the EPISTLE system, a numerical metric is used for ranking multiple parses which is defined on the form of the phrase structure being built up \[ttei82\]. And as a last example for that type, the METAL parser performs a scoring of the analyses found, which is based on both grammatical and lexical phenomena \[Slo83\]. In all these examples, the criteria on which the scoring is based do not influence the parser's behaviour but act as some sort of filter on the parser's results. The major challenge in our approach however is the application of such and similar scoring criteria on the fly during the parsing process instead of applying them after the parser has performed a blind all-paths analysis.</Paragraph> <Paragraph position="11"> If one thinks of more search intensive applications, like speech understanding with the high degree of ambiguity in the input in the form of numerous word hypotheses, the application of such heuristic criteria during the parsing process seems to have an even larger advantage over the filter approach.</Paragraph> <Paragraph position="12"> 3. A Testbed for Modelling Parsing Strategies In order to be able to model heuristic parsing strategies, one needs a suitable parsing mechanism which has enough flexibility tbr such a task. The most obvious choice for doing this is active chart parsing \[Kap73\], \[Kay80\] which is a highly general framework for constructing parsers. It combines the concept of an active chart as an extensive bookkeeping mechanism preventing the parser from performing two identical processing steps twice, with an agenda-driven control mechanism which enables a very elegant and highly modularized simulation of different control structures. And it is exactly this second feature that is central for our strategy modelling task (for details see \[Hau87\]). Since we view the development of a best-first parsing stratcgy as an empirical task, i.e. as the result of going through a number of define-test-modify cycles to build up the &quot;final&quot; heuristics, it is necessary (or at least useful fi'om a practical point of view) to have available an environment that enables the user to define and modify the heuristic function easily and supports him in seeing and checking immediately without much effort the effects of a modification.</Paragraph> <Paragraph position="13"> The APE system, in which this work is embedded, is an ATN grammar development environment which (among other things) offers the functionality needed. By means of a highly interactive, graphically-oriented user interface it offers operational facilities that give the user a number of possibilities for inspecting and debugging the parser's behaviour under a given strategy, as for example an agenda editor, the possibility to specify strategies and change them during parsing, and a chart-based fully graphical parser stepper. An heuristics editor is integrated into APE's user interface in a straightforward way: in addition to the possibility of choosing between several predefined uniform and heuristic strategies, the user can define his own strategies. The specification of the intended heuristic function is performed by giving appropriate weighting factors wfi to the various heuristic dimensions in a template-based manner.</Paragraph> <Paragraph position="14"> After the specification of the values for the various weighting factors, each expressing the relevance of the the corresponding criterion, the user is presented with the arithmetic expression associated with the corresponding heuristic function (in standard infix notation), which he can modify further if he finds the system defined composition of the weighted criteria unsatisfactory. This obviously can lead to modifications of the heuristic function's range definition, the consequences of which the user must be aware of when using this option (cf. 4.2). Details of tim heuristics specification and manipulation facility are described elsewhere (\[Hau87\], \[Geh88\]).</Paragraph> <Paragraph position="15"> Although the APE system is based en an ATN framework, the characteristics concerning heuristic information for scheduling are independent of the underlying ATN approach; the only critical point is the assumption of an active chart parsing processing scheme. Thus these considerations can be applied to a number of otimr grammar formalisms as well, especially to those belonging to the paradigms of (procedurally and descriptively) augmented phrase structure grammars.</Paragraph> <Paragraph position="16"> The implementation of the APE system and the work described here has been performed in Interlisp-.D on a Siemens EMS 5822 workstation.</Paragraph> </Section> class="xml-element"></Paper>