XML Viewer - p81-1032

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/81/p81-1032_intro.xml
Size: 14,759 bytes
Last Modified: 2025-10-06 14:04:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="P81-1032">
  <Title>Dynamic Strategy Selection in Flexible Parsing</Title>
  <Section position="2" start_page="0" end_page="143" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> When people use language spontaneously, they o~ten do not respect grammatical niceties. Instead of producing sequences of grammatically well-formed and complete sentences, they often miss out or repeat words or phrases, break off what they are .saying and rephrase or replace it, speak in fragments, or use otherwise incorrect grammar. While other people generally have little trouble co'reprehending ungrammatical utterances, most' natural language computer systems are unable to process errorful input at all. Such inflexibility in parsing is a serious impediment to the use of natural language in interactive computer systems. Accordingly, we \[6\] and other researchers including Wemchedel and Black \[14\], and Kwasny and Sondhelmer \[9\], have attempted to produce flexible parsers, i.e. parsers that can accept ungrammatical input, correcting the errors whan possible, and generating several alternative interpretations if appropriate.</Paragraph>
    <Paragraph position="1"> While different in many ways, all these approaches to flexible parsing operate by applying a uniform parsing process to a uniformly represented grammar. Because of the linguistic performance problems involved, this uniform procedure cannot be as simple and elegant as the procedures followed by parsers based on a pure linguistic competence model, such as Parsifal \[10\]. Indeed, their parsing procedures may involve several strategies that are applied in a predetermined order when the input deviates from the grammar, but the choice of strategy never depends on the specific type of construction being parsed. In light of experience with our own flexible parser, we have come to believe that such uniformity is not conducive to good flexible parsing. Rather, the strategies used should be dynamically selected according to the type of construction being parsed. For instance, partial.linear pattern matching may be well suited to the flexible parsing of idiomatic phrases, or specialized noun phrases such as names, dates, or addresses (see also \[5\]), but case constructions, such as noun phrases with trailing prepositional phrases, or imperative phrases, require case-oriented parsing strategies. The undedying principle is simple: The ap~rol~riate knowledge must be brought to bear at the right time -- and it must not interfere at other times. Though the initial motivation for this approach sprang from the r~eeds of flexible parsing, such construction.specific techniques can provide important benefits even when no grammatical deviations are encountered, as we will show. This observation may be related to the current absence of any single universal parsing strategy capable of exploiting all knowledge sources (although ELI \[12\] and its offspring \[2\] are efforts in this direction).</Paragraph>
    <Paragraph position="2"> Our objective here is not to create the ultimate parser, but to build a very flexible and robust taak.oriented parser capable of exploiting all relevant domain knowledge as well as more general syntax and semantics. The initial application domain for the parser is the central component of an interface to various computer subsystems (or tools).</Paragraph>
    <Paragraph position="3"> This interface and, therefore the parser, should be adaptable to new tools by substituting domain-specific data bases (called &amp;quot;tool descriptions&amp;quot;) that govern the behaviorof the interface, including the invocation of parsing strategies, dictionanes and concepts, rather than requiring any domain adaptations by the interface system itself.</Paragraph>
    <Paragraph position="4"> With these goals in mind, we proceed to give details of the kinds of difficulties that a uniform parsing strategy can lead to, and show how dynamically-selected construction.specific techniques can help. We list a number of such specific strategies, then we focus on our initial implementation of two of these strategies and the mechanism that dynamically selects between them while pm'alng task-oriented natural language imperative constructions. Imperatives were chosen largely because commands and queries given to a task-oriented natural language front end often take that form \[6\].</Paragraph>
    <Paragraph position="5"> 2. Problems with a Uniform Parsing Strategy Our present flexible parser, which we call RexP, is intended to parse correctly input that correaponds to a fixed grammar, and also to deal with input that deviates from that grammar by erring along certain classes of common ungrammaticalities. Because of these goals, the parser is based on the combination of two uniform parsing strategies: bottom-up parsing and pattern.matching. The choice of a bottom.up rather then a top-down strategy was based on our need to recognize isolated sentence fragments, rather than complete sentences, and to detect restarts and continuations after interjections. However, since completely bottom-up strategies lead to the consideration of an unnecessary number of alternatives in correct input, the algorithm used allowed some of the economies of top-dOwn parsing for non-deviant input. Technically speaking, this made the parser left-corner rather than bottom-up. We chose to use a grammar of linear patterns rather than, say, a transition network because pattern.matching meshes well with bottom-up parsing by allowing lookup of a pattern from the presence in the input of any of its constituents; because pattern-matching facilitates recognition of utterances with omissions and substitutions when patterns are recognized on the basis of partial matches; and because pattern.</Paragraph>
    <Paragraph position="6"> matching is necessary for the recognition of idiomatic phrases. More details of the iustifications for these choices can be found in \[6\].</Paragraph>
    <Paragraph position="7">  Research under Contract F49620-79-C-0143. The views aria cor, clusm.s C/ontmneO in this document are those Of the authors and shou~ not be inte.rDreleo as tepreser~ting the official DOhCle~ C/qther exl)resse0 or ,replied. o! DARPA, Ihe Air Force Office ol Scisn,fic Research or the US government.</Paragraph>
    <Paragraph position="8"> FlexP has been tested extensively in conjunction with a gracefully interacting interface to an electronic mail system \[1\]. &amp;quot;Gracefully interacting&amp;quot; means that the interface appears friendly, supportive, and robust to its user. In particular, graceful interaction requires the system to tolerate minor input errors and typos, so a flexible parser is an imbortant component of such an interface. While FlexP performed this task adeduately, the experience turned up some problems related to the  major theme of this paper. These problems are all derived from the incomparability between the uniform nature of The grammar representation and the kinds of flexible parsing strategies required to deal with the inherently non-uniform nature of some language constructions. In particular:.</Paragraph>
    <Paragraph position="9"> *Oifferent elements in the pattern of a single grammar rule can serve raclically different functions and/or exhibit different ease of recognition. Hence, an efficient parsing strategy should react to their apparent absence, for instance, in quite different ways.</Paragraph>
    <Paragraph position="10"> * The representation of a single unified construction at the language level may require several linear patterns at the grammar level, making it impossible to treat that construction * with the integrity required for adecluate flexible parsing.</Paragraph>
    <Paragraph position="11"> The second problem is directly related to the use of a pattern-matching grammar, but the first would arise with any uniformly represented grammar applied by a uniform parsing strategy.</Paragraph>
    <Paragraph position="12"> For our application, these problems manifested themselves most markedly by the presence of case constructions in the input language.</Paragraph>
    <Paragraph position="13"> Thus. our examples and solution methOds will be in terms of integrating case-frame instantiat=on with other parsing strategies. Consider, for example, the following noun phrase with a typical postnominal case frame: &amp;quot;the messages from Smith aDout ADA pragmas dated later than Saturday&amp;quot;.</Paragraph>
    <Paragraph position="14"> The phrase has three cases marked by &amp;quot;from&amp;quot;, &amp;quot;about&amp;quot;, and &amp;quot;dated later than&amp;quot;. This Wpe of phrase is actually used in FlexP's current grammar, and the basic pattern used to recognize descriptions of messages is: &lt;?determiner eMassageAd,1 ~4essagoHoad *NOlsageC8$o) which says that a message description iS an optional (?) determiner.</Paragraph>
    <Paragraph position="15"> followed by an arbitrary number (') of message adjectives followed by a message head word (i.e. a word meaning &amp;quot;r~essage&amp;quot;). followed by an arbitrary number of message cases, in the example. &amp;quot;the&amp;quot; is the determiner, there are no message adjectives. &amp;quot;messages&amp;quot; is the message head word. and there are three message cases: &amp;quot;from Smith&amp;quot;. * 'about ADA pragmas&amp;quot;, end &amp;quot;dated later than&amp;quot;. (~=cause each case has more than one component, each must be recognized by a separate pattern:  Here % means anything in the same word class, &amp;quot;dated later than&amp;quot;, for instance, is eauivalent to &amp;quot;since&amp;quot; for this purpOSe.</Paragraph>
    <Paragraph position="16"> These patterns for message descr~tions illustrate the two problems mentioned above: the elementS of the .case patterns have radically different functions - The first elements are case markers, and the second elements are the actual subconcepts for the case. Since case indicators are typically much more restriCted in expression, and therefore much easier to recognize than Their corresponding subconc~ts, a plausible strategy for a parser that &amp;quot;knows&amp;quot; about case constructions is to scan input for the case indicators, and then parse the associated subconcepts top-down. This strategy is particularly valuable if one of the subconcepts is malformed or of uncertain form, such as the subject case in our example. Neither &amp;quot;ADA&amp;quot; nor &amp;quot;pragmas&amp;quot; is likely to be in the vocabulary of our system, so the only way the end of the subject field can be detected is by the presence of the case indicator &amp;quot;from&amp;quot; which follows iL However, the present parser cannot distinguish case indicators from case fillers - both are just elements in a pattern with exactly the same computational status, and hence it cannot use this strategy.</Paragraph>
    <Paragraph position="17"> The next section describes an algorithm for flexibly parsing case constructions. At the moment, the algorithm works only on a mixture of case constructions and linear patterns, but eventually we envisage a number of specific parsing algorithms, one for each of a number of construction types, all working together to provide a more complete flexible parser.</Paragraph>
    <Paragraph position="18"> Below, we list a number of the parsing strategies that we envisage might be used. Most of these strategies exploit the constrained task.oriented nature of the input language: * Case-Frame Instantiation is necessary to parse general imperative constructs and noun phrases with posThominal modifiers. This method has been applied before with some success to linguistic or conceptual cases \[12\] in more general parsing tasks. However, it becomes much more powerful and robust if domain-dependent constraints among the cases can be exploited. For instance, in a filemanagement system, the command &amp;quot;Transfer UPDATE.FOR to the accounts directory&amp;quot; can be easily parsed if the information in the unmarked case of transfer (&amp;quot;ulXlate.for&amp;quot; in our example) is parsed by a file-name expert, and the destination case (flagged by &amp;quot;to&amp;quot;) is parsed not as a physical location, but a logical entity ins=de a machine. The latter constraint enables one to interpret &amp;quot;directory&amp;quot; not as a phonebook or bureaucratic agency, but as a reasonable destination for a file in a computer.</Paragraph>
    <Paragraph position="19"> * Semantic Grammars \[8\] prove useful when there are ways of hierarchically clustering domain concepts into functionally useful categories for user interaction. Semantic grammars, like case systems, can bring domain knowledge to bear in dissmbiguatmg word meaningS. However, the central problem of semantic grammars is non-transferability to other domains, stemming from the specificity of the semantic categorization hierarchy built into the grammar rules. This problem is somewhat ameliorated if this technique is applied only tO parsing selected individual phrases \[13\], rather than being res0onsible for the entire parse. Individual constituents, such as those recognizing the initial segment of factual queries, apply in may domains, whereas a constituent recognizing a clause about file transfer is totally domain specific. Of course, This restriction&amp;quot; calls for a different parsing strategy at the clause and sentence level.</Paragraph>
    <Paragraph position="20"> * (Partial) Pattern Matching on strings, using non.terminal semantic.grammar constituents in the patterns, proves to be an interesting generalization of semantic grammars. This method is particularly useful when the patterns and semantic grammar non-terminal nodes interleave in a hierarchical fashion.</Paragraph>
    <Paragraph position="21"> e Transformations to Canonical Form prove useful both for domain-dependent and domain.independent constructs.</Paragraph>
    <Paragraph position="22"> For instance, the following rule transforms possessives into &amp;quot;of&amp;quot; phrases, which we chose as canonical: \['&lt;ATTRZBUTE&gt; tn possessive form.</Paragraph>
    <Paragraph position="23"> &lt;VALUE&gt; lagltfmate for attribute\] -&gt; \[&lt;VALUE&gt; &amp;quot;OF&amp;quot; &lt;ATTRZBUTE&gt; In stipple forll\] Hence, the parser need only consider &amp;quot;of&amp;quot; constructions (&amp;quot;file's destination&amp;quot; =&gt; &amp;quot;destinaUon of file&amp;quot;). These transforms simplify the pattern matcher and semantic grammar application process, especially when transformed constructions occur in many different contextS. A rudimentary form of string transformation was present in</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML