File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/84/p84-1047_intro.xml
Size: 16,104 bytes
Last Modified: 2025-10-06 14:04:26
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1047"> <Title>Entity-Oriented Parsing</Title> <Section position="2" start_page="0" end_page="214" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> The task of lypical natural language interface systems is much simpler than the general problem of natural language understanding: The simplificati~ns arise because: 1. the systems operate within a highly restricted domain of discourse, so that a preci..~e set of object types c;~n be established, and many of tl;e ambiguities that come up in more general natural language processing can be ignored or constrained away; 2. even within the restricted dolnain of discourse, a natural language i.terface system only needs to recognize a limited subset of all the Ihings that could be said -- the subset that its back-end can respond to.</Paragraph> <Paragraph position="1"> The most commonly used tr:chnique to exploit these limited domain constraints is semantic ~j~amrnar \[I, 2, 9\] in which semantically defined categories (such as <ship> or <shipattrihute>) are used in a gramrnur (usually ATN based) in place of syntactic categories (such as <noun> or <adjective>). While semantic grammar has been very successful in exploiting limited domain constraint.~ to reduce ambiguities and eliminate spurious parses of grammatical input, it still suffers from the fragility in the face of extragrammatical input characteristic of parsing based on transition nets \[41. AI~o. the task of restricted-domain language definition is typically difficult in interlaces based on semantic grammar, in part bscaus~ th.,: grammar definition formalism is not well imegrated with the method of d~..fining the object and actions of tl~e domain of discourse (though see \[6\]).</Paragraph> <Paragraph position="2"> \]his paper proposes an alternat;ve approach to restricted domain langua~fe recognition calI~d entity-oriented p;rsing.</Paragraph> <Paragraph position="3"> Entity-orie=-ted parsing uses the same notion of semar~tlcallydefined catctjeries a.', ~2mantic grammar, but does net embed these cate,:.iories in a grammatical structure designed for sy.tactic recognition. Instead, a scheme more reminiscent of conceptual or case.frame parsers \[3, 10, II\] is employmf. An entity-oriented parser operates from a collection of definitions of the various entities (objects. events, cem, m~mds, states, etc.) that a particular interf:~ce sy-~teln needs to r:.~cognize. These definitions contain informatiol~ about the internal structure of the entities, about the way the entitie:~ will be manifested in the natural language input, s~}(I about the correspondence belween the internal shucture and surface repres.~ntation. \]his arrangement provides a good frarnewo~k for exploiting the simplifications possible in restricted PSlocY~ain natt:rnl lanouage recognition because: 1. the entitle:z; form a ~dtural set of !ypes through which to cun:~train Ih~; recognition semantically. the types also form a p.alura~ basis fnr the structurctl definitions of entities.</Paragraph> <Paragraph position="4"> 2. the set of things thai the back-end can respond to corresponds to a subSet of the domain -:-nlities (remember that entities can be events or commar,ds as well as objects).</Paragraph> <Paragraph position="5"> Re the f~o~l of an entity.ori,;nted ~ystem will normally be to recognize one of a &quot;top.ievel&quot; class of entities. This is analogous to the sot el basic message pa~.terns that Lhe ir;\[~.chin~; translation system of Wilks \[11\] aimed to recognize in any input.</Paragraph> <Paragraph position="6"> In addition to providing a good general basis for restricted domain n41ural language recognition, we claim that the entity~ o;iented ,~pproach also fa,.;iJitate5 rubu:.;tness in the face of ex~r~tgrammatical input ~.l~(I ease nf k~guage definition for ros;r!ctc:l d'm;cJn I~ng~.~Ua:~. EnLity.arie,~ted parsh;g I',.~.s the potential to provide better parsing robustness Lhan more traditional semantic gramn~;\]r techniques for two major reasons: * The individual definition of aq domain entities facilit~los their indepcncl,~mt recoL4rfilion. As:,um;;t,':l there is apl)rof~riaLe inde'<ing at entiLies tl~rough lex~cai ~toms that mir;iht appt~ar in a surface dt.'.~cription '.}f them. thi:~ rc.cognitior: c;;n be done bottom.up, thus rnuking pos:.ible recognition of elliptical, tru~Fner{~ary, or p~rtially incornpr~.h~;,,siblo input. The same de~imtions can ~i..-;(, be us~cl i~ a m.:.~re eft;cic:nt top-down f\[l;Jt*ll!~:'l when t!le input conlorrns to the system's exDect.alio~\]s.</Paragraph> <Paragraph position="7"> ,, Recem work \[5, 8\] h~ls suggested the usefulness of multiple cor~structioq.specific reco.qnition str;tt(;gies f,ar restrict,~d domah\] parsing, pat ticularly for dealing witll extragr;.'nimaiic.q! input. 1 he ir~dividual entity cJo!initlons form an i(h;al \[rc, rnewur}~ arcq~,d which to organize lhr multiple strateg!es. In particular, each definitio~ can specify which strategies are applicable to recognizing it. Of course, &quot;this only provides a framework for robust recognition, the robustness achieved still depends on the quality of the actual recognition strategies used.</Paragraph> <Paragraph position="8"> The advantages of entity-oriented parsing for language definition include: * All information relating to an entity is grouped in one place, so that a language definer will be able to see more clearly whether a dehnition is complete and what would be the conseouences of any addition or change to the definition.</Paragraph> <Paragraph position="9"> * Since surface (syntactic) nnd structural information about an entity is groupe~t to~\]ether, tile s,.trface information cau refer to the structure in a clear al';{\] coherent way. In particular, this allows hierarchical surface information to use the natural hierarchy defined by the structural informatiol~, leading to greater consistency of coverage in the surface language.</Paragraph> <Paragraph position="10"> * Since entity definitions are independent, the information necessary In drive Jecognilion by the mulliple constructionspucific strL, tegi~:s mentioned above can be represented directly in the form most useful to each strategy, thus removing the need for any kind of &quot;grammar co~pilation&quot; step and allowing more rapid PSirammar development.</Paragraph> <Paragraph position="11"> In the remainder of the paper, we make these arguments more concrete by looking at some fragments of an entity-oriented lan(\]u~ge definition, by outlining the control :~truclure of a robust resUicted-domain parser driven by such defiqitions, and by tracing through some worked examples of !he parser in operation. These examples also shown describe some specifi~ parsing strategies that exploit the control structures. A parser i~=corporating the control structure and the parsing strategies is currently under implementation. Its design embodies our e;{perience with ~ pilot entily-oriented parser that has already been implemented, but is not described here.</Paragraph> <Paragraph position="12"> r--v 4 .,. ~,,ampie Entity Definitions This section present'~ .~r)me example eat=t,/ and language (lefi,fitions suitable for use in entity-oriente(\] parsing. The examples are drawn fi om the Oomain of an in!~rface to a database of college courses. Here is the (partial) de\[initio=~ of a course, For reasons of space, we cannot explain all the details of this language. In essence, zz course is definc'd as 3 structured object with components: number, department, instructor, etc. (square brackets denote attribute/value lists, and round brackets ordinary lists). &quot;lhis definition is kept separate from the surface representation of a course which is defined to be a noun phrase with adjectives, postnor~irla! cases, etc.. At a more deiailed level, note the special purpose way of specifying a course by its department juxtaposed with its number (e.g. Computer Science 101) is handled by an alternate patt.'.,rn for the head of the noun phrase (dollar signs refer back to the components). Tiffs allows the user to s,sy (redur=,~antly) phrases like &quot;CS 101 taught by Smith&quot;. Nolo. also that the way the dep~C/rtment of a course can appear in the surface representation of a course is specified in terms of the PS:ourseDepartment component (and hence in terms of its type, Colleg(;Depmln\]ent) rather than directly as an explicit surface representation. This ensures consistency througl~out the language in what will be recognized as a description of a department. Coupled wdh the ability to use general syntactic descriptors (like NounPhrase in the description of a SurfaceRepresentation), this can prevent the ki~,J of patchy coveraqe prevalent with standard semantic grammar language definitions.</Paragraph> <Paragraph position="13"> Subsidiary objects like CollegeDepartment are defined in similar fashion.</Paragraph> <Paragraph position="14"> r;cllegeCoursu will also be involved in higher-level entities ef our restricted domain such as a cc}mrnan(I to the data base ay.*t:.~m to +:.rol a student in a course.</Paragraph> <Paragraph position="15"> These examples als~ show how all information about an entity, co.cerning both tundamental structure and surface representation, is grouped tooeth',~r al~d integrated. Tiff,.; supports the claim that entity-c~ri~nted lanuuage definition makes it easier to deter.nine whether a language definition is complete.</Paragraph> <Paragraph position="16"> 3. Control Structure for a tqcbust Entity-</Paragraph> <Section position="1" start_page="213" end_page="214" type="sub_section"> <SectionTitle> Oriented Parser </SectionTitle> <Paragraph position="0"> lhe potential advanta.qes of an entily-oriented approach from tile point of view of robLmtne.~3 in the face of ungr:C/mmatical input were outlined in the inlrodu(.tion. To exploit this potential while maintaining efficiency in parsing grammatical input, special attention must he paid to the control structure of the parser used.</Paragraph> <Paragraph position="1"> Desirable characteri,=.tics for the control Structure uf ;my parser capable of handling ungrammatical as well as grammatical input include: . the control structure allows grammatical input to be parsed straightforwardly without consider.ring any of the possible gralnmatical deviations d;at could occur; * the om~trol structure enables progr~:,~siw:.ly highP.r degrees of grammatical (leviatior~ Io be consi(Ic~:.~d when the ilt\[~LIt does not satisfy grammatical exp,~ctations; * the control structure ;dlows simpler deviatio.s to be considered before more complex deviations.</Paragraph> <Paragraph position="2"> \]he first two points are self-evident, but the third lll;+ty require some explanalion. &quot;The r, robl~m it addresses arises particularly when there are several alternative parses under consideration. In s.ch cases, it is important to prevent the parser h'om cons!tiering drastic (levi.xtions in one branch of the par.~'e before cor~si(lering si~nple ones in the othur. For in::'.ance, tile par.~er sh(;uld not start hypothesizir=g missing words ir; one bra.ch when a ~;impl,~) sp~flli~l O correction in another blanch would allow tile parse IC/~ go through. We have (le-;i(jned a parser control .~hucture for use in e~,tityoriented p~.':;in U which i}a~; all (,~, the rh;lracteristics lis~e,t above. Thi.~ control structure operates thrr~u~;h an acJenda mechanism.</Paragraph> <Paragraph position="3"> Each item of the agenda represents a dii'ier,.:nt nonU/\]uati.on of the paine, i.e. a partial parse plus a specificatit,+~ of what to do next to continue that partial parse, With each cont}nuation is associated an integer flexibility level that represents the degree of grammatical deviation imphed by the continuation. That is, the flexibility level represents the degree of grammatical deviation in the input if the continuation were to produce a complete parse' without finding any more deviation. Continuations with a lower flexibility are run before continuations with a higher flexibility level. Once a complete parse has been obtained, continuations with a, flexibility level higher than that of the continuation which resulted in the parse are abandoned. This means that the agenda mechanism never activates any continuations with a flexibility level higher than the level representing the lowest level of grammatical deviation necessary to account for the input. Thus effort is not wasted exploring more exotic grammatical deviations when the input can be accounted for by simpler ones. This shows that the parser has the first two of the characteristics listed above. In addition to taking care of alternatives at different flexibility levels, this control structure also handles the more usual kind of alternatives faced by parsers -- those representing alternative parses due to local ambiguity in the input. Whenever such an ambiguity arises, the control structure duplicates the relevant continuation as many times as there are ambiguous alternatives, giving each of the duplicated continuations the same flexibility level. From there on, the same agenda mechanism used for the various flexibility levels will keep each of the ambiguous alternatives separate and ensure that all are investigated (as long as their flexibility level is not too high). Integrating the treatment of the normal kind of ambiguities with the treatment of alternative ways of handling grammatical deviations ensures that the level of grammatical deviation under consideration can be kept the same in locally cmbiguous branches of a parse. This fulfills the third characteristic listed above.</Paragraph> <Paragraph position="4"> Flexibility levels are additive, i.e. if some grammatical deviation has already been found in the input, then finding a new one will raise the flexibility level of the continuation concerned to the sum of the flexibility levels involved. This ensures a relatively h!gh flexibility level and thus a relatively low likelihood of activation for continuations in which combinations of deviations are being postulated to account for the input, Since space is limited, we cannot go into the implementation of this control structure. However, it is possible to give a brief description of the control structure primitives used in programming the parser. Recall first that the kind of entity-oriented parser we have been discussing consists of a collection of recognition strategies. The more specific strategies exploit the idiosyncratic features of the entities/construction types they are specific to, while the more general strategies apply to wider cl3sses of entities and depend on more universal characteristics.</Paragraph> <Paragraph position="5"> In either case, the strategies are pieces of (Lisp) program r~.ther than more abstract rules or networks. Integration of such strategies with the general scheme of flexibility levels described above is made straightforward through a special split function which the control structure supports as a primitive. This split function allows the programmer of a strategy to specify one or more alternative continuations from any point in the strategy and to associate a different flexibility increment with each of them.</Paragraph> <Paragraph position="6"> The implementation of this statement takes care of restarting each of the alternative continuations at the appropriate time and with the appropriate local context.</Paragraph> <Paragraph position="7"> Some examples should make this account of the control structure much clearer. The examples will also present some specific parsing strategies and show how they use the split function described above. These strategies are designed to effect robust recognition of extragrammatical input and efficient recognition of grammatical input by exploiting entity-oriented language definitions like those in the previous section.</Paragraph> </Section> </Section> class="xml-element"></Paper>