File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/81/p81-1032_metho.xml
Size: 14,984 bytes
Last Modified: 2025-10-06 14:11:24
<?xml version="1.0" standalone="yes"?> <Paper uid="P81-1032"> <Title>Dynamic Strategy Selection in Flexible Parsing</Title> <Section position="3" start_page="143" end_page="144" type="metho"> <SectionTitle> PARRY \[11 \]. </SectionTitle> <Paragraph position="0"> e Target-specific methods may be invoked to portions of sentences not easdy handlecl by The more general methods. For instance, if a case-grammar determines that the case just s=gnaled is a proper name, a special nameexpert strategy may be called. This expe~ knows that nantes can contain unknown words (e.g., Mr. Joe Gallen D'Aguila is obviously a name with D'Aguila as the surname) but subject to ordering constraints and morphological preferences.</Paragraph> <Paragraph position="1"> When unknown words are encountered in other positions in a sentence, the parser may try morphological decomposition, spelling correction, querying the user, or more complex processes to induce the probable meaning of unknown words, such as the project-and-integrate technique described in \[3\]. Clearly these unknown.word strategies ought to be suppressed in parsing person names.</Paragraph> </Section> <Section position="4" start_page="144" end_page="145" type="metho"> <SectionTitle> 3. A Case-Oriented Parsing Strategy </SectionTitle> <Paragraph position="0"> As part of our investigations in tosk-oriented parsing, we have implemented (in edditio,n to FlexP) a pure case-frame parser exploiting domain-specific case constraints stored in a declarative data structure, and a combination pattern-match, semantic grammar, canonicaltransform parser, All three parsers have exhibited a measure of success, but more interestingly, the strengths of one method appear to overlap with the weaknesses of a different method. Hence, we are working towards a single parser that dynamically selects its parsing strategy to suit the task demands.</Paragraph> <Paragraph position="1"> Our new parser is designed primarily for task domains where the prevalent forms of user input are commands and queries, both expressed in imperative or pseudo-imperative constructs. Since in imperative constructs the initial word (or phrase), establishes the case.frame for the entire utterance, we chose the case-frame parsing strategy as priman/.</Paragraph> <Paragraph position="2"> In order to recognize an imperative command, and to instantiate each case, other parsing strategies are invoked. Since the parser knows what can fill.a particular case, it can choosethe parsing strategy best suited for linguistic constructions expressing that type of information.</Paragraph> <Paragraph position="3"> Moreover, it can pass any global constraints from the case frame or from other instantiated cases to the subsidiary parsers . thus reducing potential ambiguity, speeding the parse, and enhancing robustness.</Paragraph> <Paragraph position="4"> Consider our multi-strategy parsing algorithm as described below.</Paragraph> <Paragraph position="5"> Input is assumed to be in the imperative form: 1. Apply string PATTERN-MATCH to the initial segment of the input using only the patterns previously indexed as corresponding to command words/phrases in imperative constructions. Patterns contain both optional constituents and non.terminal symbols that expand according to a semantic grammar. (E.g., &quot;copy&quot; and &quot;do a file transfer&quot; are synonyms for the same command in a file management system.) 2. Access the CASE.FRAME associated with the command just recognized, and push it onto the context stack. In the above example, the case.frame is indexed under the token <COPY),, which was output by the pattern matcller, The case frame consists of list of pairs (\[case.marker\] \[case-filler.</Paragraph> <Paragraph position="6"> information\[, ...).</Paragraph> <Paragraph position="7"> 3. Match the input with the case rharkers using the PATTERN-MATCH system descriOecl above.&quot; If no match occurs, assume the input corresponds to the unmarked case (or the first unmarked case, if more than one is present), and proceed to the next step.</Paragraph> <Paragraph position="8"> 4. Apply the Darsin(7 strategy indicated by the type of construct expected as a case filler. Pass any available case constraints to the suO-f~arser. A partial list of parsing strategies indicated by expected fillers is: * Sub-imperative -- Case.frame parser, starting with the command-identification pattern match above.</Paragraph> <Paragraph position="9"> * Structured-object (e.g., a concept with subattributes) .- Case-frame parser, starting with the pattern-marcher invoked on the list of patterns corresponding to the names (or compound names) of the semantically permissible structured objects, followed by case-frame parsing of any present subattributes.</Paragraph> <Paragraph position="10"> * Simple Object .- Apply the pattern matcher, using only the patterns indexed as relevant in the case-fillerinformation field.</Paragraph> <Paragraph position="11"> Special Object -- Apply the .parsing strategy applicable to that type of special object (e.g., proper names, dates, quoted strings, stylized technical jargon, etc...) None of the above -- (Errorful input or parser deficiency) Apply the graceful recovery techniques discussed below.</Paragraph> <Paragraph position="12"> 5. If an embedded case frame is. activated, push it onto the context stack.</Paragraph> <Paragraph position="13"> 6. When a case filler is instantiated, remove the <case.marker), <case-filler-information> pair from the list of active cases in the appropriate case frame, proceed to the next casemarker, and repeat the process above until the input terminates.</Paragraph> <Paragraph position="14"> 7, ff all the cases in a case frame have been instantiated, pop the context stack until that case frame is no longer in it. (Completed frames typically re~de at the top of the stack.) 8. If there is more than One case frame on the stack when trying to parse additional inpuL apply the following procedure: * If the input only matches a case marker in one frame, proceed to instantiste the corresponding case-filler as outlined above. Also, if the matched c8~e marker is not on the most embedded case frame (i.e., at the top of the context stack), pop the stack until the frame whose case marker was matched appears at the top of the stack.</Paragraph> <Paragraph position="15"> * If no case markers are matched, attempt to parse unmarked cases, starting with the most deeoly embedded case frame (the top of the context stack) and proceeding outwards. If one is matched, pop the context stack until the corresponding case frame is at the top. Then, instantiats the case filler, remove the case from the active case frame, and proceed tO parse additional input. If more then one unmarked case matches the input, choose the most embedded one (i.e., the most recent context) and save the stats of the parse on the global history stack. (This soggeat '= an ambiguity that cannot be resolved with the information at hand.) * If the input matches more than one case marker in the context stack, try to parse the case filler via the indexed parsing strategy for each filler.information slot corresponding to a matched case marker. If more then one case filler parses (this is somewhat rare sJtustion indicating underconstrained case frames or truly ambiguous input) save the stats in the global history stack arid pursue the parse assuming the mOst deeply embeded constituent, \[Our case.frame attachment heuristic favors the most }ocal attachment permitted by semantic case constraints.\] g. If a conjunction or disjunction occurs in the input, cycle through the context stack trying to parse the right-hand side of the conjunction as filling the same case as the left hand side. If no such parse is feasible, interpret the conjunction as top-level, e.g, as two instances of the same imperative, or two different imperatives, ff more than one parse results, interact with the user to disaml~iguate. To illustrate this simple process, consider.</Paragraph> <Paragraph position="16"> &quot;Transfer the programs written by Smith and Jones to ...&quot; &quot;Transfer the programs written in Fortran and the census data files to ...&quot; &quot;Transfer the prOgrams written in Fortran and delete ...&quot; The scope of the first conjunction is the &quot;author&quot; subattribute of program, whereas the scope of the second coniunction is the unmarked &quot;obieot&quot; case of the thrustor action. Domain knowledge in the case-filler information of the &quot;ob)ect&quot; case in the &quot;transfer&quot; imperative inhibits &quot;Jones&quot; from matching a potential object for electronic file transfer, Similarly &quot;Census data files&quot; are inhibited from matching the &quot;author&quot; subattribute of a prOgram. Thus conjunctions in the two syntactically comparable examples are scoped differently by our semantic-scoping rule relying on domain-specific case information. &quot;Delete&quot; matches no active case filler, and hence it is parsed as the initial Segment Of a second conjoined utterance. Since &quot;delete&quot; is a known imperative, this parse succeeds.</Paragraph> <Paragraph position="17"> 10. If the Darser fails to Darse additional input, pop the global history stack and pursue an alternate parse. If the stack is empty, invoke the graceful recovery heuristics. Here the DELTA-MIN method \[4\] can be applied to improve upon depth.first unwinding of the stack in the backtracking pro,:_ _~,s_l__ 11. If the end of the input is reached, and the global hiMo;y stack is not empty, pursue the alternate parses. If any survive to the end of the input (this should hot be the case unless true amt~iguity exists), interact with the user to select the appropriate parse (see \[7).\] The need for embeded case structures and ambiguity resolution based on domain-dependent semantic expectations of the case fillers is illustrated by the following paJr of sentences: &quot;Edit the Drograms in Forlran&quot; &quot;Edit the programs in Teco&quot; &quot;Fortran&quot; fills the language attribute of &quot;prOgram&quot;, but cannot fill either the location or instrument case of Edit (both of which can be signa~d by &quot;in&quot;). In the second sentence, however, &quot;Teed&quot; fills the instrument case of the veYO &quot;edit&quot; and none of the attributes of &quot;program&quot;. This disembiguation is significant because in the first example the user specified which programs (s)he wants to edit, whereas in the second example (s)he specified how (s)he wants to edit them.</Paragraph> <Paragraph position="18"> The algorithm Drseented is sufficient to parse grammatical input. In addition, since it oper-,tes in a manner specifically tailored to case constructions, it is easy to add medifications dealing with deviant input. Currently, the algorithm includes the following steps that deal with ungrammaticality: 12. If step 4 fails. Le. a filler of appropriate type cannot be parsed at that position in the inDut, then repeat step 3 at successive points in the input until it produces'a match, and continue the regular algorithm from there. Save all words not matched on a SKIPPED list. This step tal~es advantage of the fact that case markers are often much easier to recognize than case fillers to realign the parser if it gets out of step with the input (because of unexpected interjections, or other spurious or missing won:is).</Paragraph> <Paragraph position="19"> 13. It wor(ls are on SKIPPED at the end of the parse, and cases remain unfilled in the case frames that were on the context Mack at the time the words were skipped, then try tO parse each of the case fillers against successive positions of the skipped sequences. This step picks up cases for which the masker was incorrect or gadoled.</Paragraph> <Paragraph position="20"> 14. if worOs are Mill on SKIPPED attempt the same matches, but relax the pstlern matching procedures involved.</Paragraph> <Paragraph position="21"> 15. If this still does not account for all the input, interact with the user by asking cluestions focussed on the uninterprsted Dart of the input. The same focussed interaction techniclue (discussed in \[7\]) is used to resolve semantic ambiguities in the inpuL 16. If user intersction proves impractical, apply the project-and-integrate method \[3\] to narrow down the meanings of unknown words by exploiting syntactic, semantic and contextual cues.</Paragraph> <Paragraph position="22"> These flexible paring steps rely on the construction-specific 8SDeC/~ of the basic algorithm, and would not be easy to emulate in either a syntactic ATN parser or one based on a gum semantic gnlmmer. A further advantage of our rnixed.stnl~ approach is that the top. level case structure, in es~mce, partitions the semantic world dynamically into categories according to the semanbc constraints On the active case fillers. Thus, when a pattern matcfler is invoked to parle the recipient case of a file-transfer case frlmle, it need Only consider I::~terns (and semantc.gramrnm&quot; constructs) that correspond to logical locations insole a computer. This form Of eXl~&quot;ts~n-drMm I~u~ing in restricted domains adds a two-fold effect to its rcbusmesC/ * Many smmous parses are .ever generatod (bemnmo patterns yielding petentisfly spurious matches are never in inappropriate contexts,) * Additional knowledge (such as additional ~ grammar rules, etc.) can be added without a corresponding linear inc~ in parso time since the coes.frames focus only upon the relevant sul3sat of patterns and rules. Th. Ink the efficiency of the system may actually inormme with the addition of more domain knowledge (in effect shebang the case fnmmes to further rssmct comext). Thle pehm~ior ~ it Do.ibis to incrementally build the ~ wWtout the everpresent fesr theta new extension may mal~ ltm entire pemer fail due to 8n unexl:)ected application of that extension in the wrong context.</Paragraph> <Paragraph position="23"> In closing, we note that the algorithm ~ above does not mer~ion interaction with morphotogicai deC/ompoaltion or 81:XMllng correction. LexicaJ processing is particularly important for robust Parsing; indeed, based On our limited eXl::~rienca, lexicaJ-level errcra m'e a significant source of deviant input. The recognition and handling of lexical-deviation phenomena, such as abbreviations and mies~Hlings, must be integrated with the more usual morDhotogical analySbl. Some of these topics are discussed indeoendently in \[6\], However, intl.'prig resilient morDhologicaJ analysis with the algorithm we have outlined is a problem we consider very important and urgent if we are to construct * practical flexible parser.</Paragraph> </Section> class="xml-element"></Paper>