File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1108_metho.xml
Size: 26,142 bytes
Last Modified: 2025-10-06 14:11:54
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1108"> <Title>User Specification of Syntactic Case Frames in TELI, A Transportable, User-Customized Natural Language Processor</Title> <Section position="2" start_page="0" end_page="456" type="metho"> <SectionTitle> 2. The Importance of Case Frame Information </SectionTitle> <Paragraph position="0"> Following Ballard and Tinkham (1984), TELI seeks to enable domain-independent English processing by maintaining detailed case frame information about the phrase types provided for by the system. For example, when accessing a restaurant database, the system would know not only that &quot;serve&quot; is a transitive verb but also that it requires objects of type Restaurant as subject and either Food or Meal as object. Thus, if &quot;Japanese&quot; is known to be a type of Food, and &quot;lunch&quot; a Meal, then the system would accept &quot;Which restaurants serve lunch?&quot; &quot;How many restaurants serve Japanese food?&quot; and reject .... Which meals serve Japanese food?&quot; * &quot;How many meals serve a restaurant?&quot; As a more elaborate example, suppose we are accessing information about researchers at Bell Labs, and we ask &quot;Which manager does the newest speech employee not in building two report to?&quot; The exact phrase types involved in the above input are Verb Phrase: (employee report nil nil to manager) Prepositional Phrase: (employee in building) Noun-Noun Phrase: (project employee) where &quot;nil&quot; denotes unfilled optional slots for direct object and particle. Adverbials (&quot;not&quot;) and inflections of single-word modifiers (&quot;newest&quot;) are handled by mechanisms separate from those associated with what we are calling phrase types (see Section 9).</Paragraph> <Paragraph position="1"> As suggested above, we treat the noun being modified by a prepositional phrase as an argument of the modifier in question (e.g &quot;in&quot;). Thus, departing from more conventional treatments, our &quot;head noun&quot; is part of the prepositional case frame, which therefore comprises three rather than two slots. Similarly, adjective phrase case frames comprise four rather than three slots (see Section 5). Our syntactic and semantic treatment of prepositional phrases is suggested by the &quot;Intermediate Representation&quot; shown in Figure 1. 3. Situations In Which Case Frames May Be Considered There are presently five situations in TELI where users are able to examine and possibly modify syntactic case frames. The first of these occurs during initial customization, when the system first confronts a new 5El,IT ......................... -ZL~---- == ............................ _~ NOUN NOT PEL .............. ~..~<:.-~ ...... -~ ........... .. EMPLIOYEE PRE~-PH ','EP6 FPE;:' :,u} ar',3 .......... .................. .OIRK ol, ! oub,t.</Paragraph> <Paragraph position="2"> it, l ...... &quot; ....................... * NIJUN NI)UNU,qL S P E\[E C H database. The remaining four, which concern the central English processing mode of TEL1 relevant to this paper, are as fellows.</Paragraph> <Paragraph position="3"> When explicitly requested by the user. When logging on to the system, or at a later time, uscrs can ask what words and phrases are associated with a particular domain object. This contributes to the habitability of the interface.</Paragraph> <Paragraph position="4"> When adding new vocabulary items. For example, if the user specifies &quot;open&quot; as an adjective, the system asks for its associated case frames.</Paragraph> <Paragraph position="5"> When attempting to recover front a parsing failure. For example, if a sentence that the system cannot parse contains the word &quot;with&quot;, the system will offer to show the user all existing prepositional triples of the form Entity-with-Fntity. This permits the user to ascertain whether the parsing failure was caused by missing case frame information or for some other reason. If the problem is due to missing case frame information, the user can add it, then have the system retry the input. When semantic information is being considered. Users may ask to examine or modify current definitions of prepositional phrases, verb phrases that take &quot;up&quot; as a particle, and so forth. To do so, the user first specifies the syntactic relationships of the phrases of interest. As shown below, the manner in which the user specifies a phrase or range of phrases is independent of the reason the case frame information is being sought.</Paragraph> <Paragraph position="6"> 4. Principles Behind Case Frame Specifications The primary criteria our methods of case frame specification are designed to meet are: To be independent of the specific NLP that information is being supplied to. This permits us to alter or augment the underlying case frames used by the parser without having to change any of the actual code responsible for acquiring phrasal compatibilities from the user. For instance, we have made several changes in the way relational nouns like &quot;classmate&quot; are processed, without any changes to the customization modules.</Paragraph> <Paragraph position="7"> To be fully data-driven. Our knowledge acquisition modules provide general capabilities for a large class of phrase types, but they know nothing specific about verb phrases, etc. This is our principal method of achieving the previous criterion. At present, about two pages of specifications of a form shown below are used to drive TELI'a knowledge acquisition component.</Paragraph> <Paragraph position="8"> To be driven by data which can in principal be inferred from the underlying grammar. This means that any changes to the grammar will be automatically reflected in the modules that acquire case frame information. At present, about half of the two pages of data that drive our syntactic knowledge acquisition module are taken directly from the grammar.</Paragraph> <Paragraph position="9"> In addition to the above criteria, which relate to automating the process of customizing an NLP, an additional human factors criterion is to have types of information which function similarly, from the user's standpoint, appear similar as presented by the system.</Paragraph> <Paragraph position="10"> Thus, output formats do not always reflect how information is stored and manipulated by the system.</Paragraph> <Paragraph position="11"> 5. Defining the User Interface to Case Frames At present, TELI provides for five phrase types: Adjective Phrase: e.g. researchers associated with TEL1 Noun-Modifier Phrase: e.g. the COLING presentations Verb Phrase: e.g. employees working with Brachman Prepositional Phrase: e.g. the researchers in Marcus' department Relational Noun Phrase: e.g. the associates of Litman, the salary of Smith In discussing how actual case frame acquisitions are done, we wilt find it convenient to give in detail all the information associated with one of the phrase types provided for by the system grammar. For this purpose, we have chosen to consider adjective phrases, since the situations they involve are fairly representative. The actual system provides somewhat more sophisticated capabilities than what we have space to describe here, especially in its treatment of verb phrases.</Paragraph> <Paragraph position="12"> Before proceeding, we note that the actual data structures used in TELl differ slightly from those presented here, although they contain precisely the same information. Also, we mention that our use of the term &quot;interface designer&quot; reflects our belief that most of the job about to be described can be done by a trained user of the system, as opposed to the actual system builders. Before TELI is supplied with phrase type information, it will have been given lexical information about each part of speech recognized by the underlying grammar. Parts of speech are also classified as either &quot;open&quot; or &quot;closed&quot;, the former enabling the user to supply new words of that type. For example, the system designer might have specified Open (adjective, noun, verb .... ) Closed = (article, prep, ...) This information is used by the acquisition module in deciding which case frame slots may be filled with vocabulary items not already in the system lexicon.</Paragraph> <Paragraph position="13"> As a first step in telling the system about phrase types, the interface designer must indicate for each case frame slot (1) a name to be used inside the system to identify this slot, (2) an appropriate filler type, and (3) an external name to be used as a label in system output. For adjective phrases this might be given as</Paragraph> <Paragraph position="15"> where &quot;adjinfo&quot; is an arbitrary symbol used internally to reference adjective phrase case frames. Slot names (head, adj, prep, obj) are arbitrary; filler types (entity, adj, prep) generally correspond to parts of speech, although &quot;entity&quot; denotes the subset of nouns that comprise the primitive object types of the domain at hand. For example, in a building domain, Room might be a basic object type (entity), while &quot;office&quot; is merely a noun that refers to some of the objects of type Room.</Paragraph> <Paragraph position="16"> Finally, extenTal names (&quot;Subject&quot; etc.) may be any string useful in identifying a case role.</Paragraph> <Paragraph position="17"> Next, the interface designer specifies an arbitrary number of templates which the system will seek to match against a user's English-like case frame specification. For example, (adjinfo (a Head can be Adj Prep an Obj)) enables the system to recognize a specification such as &quot;a room can be adjacent to a corridor&quot; as a reference to an adjective phrase case frame. Recall that this information is given by the interface designer and does not define, but merely reflects, the grammatical coverage provided by the underlying parser. Note that case frame templates are specified in terms of case frame labels rather than parts of speech.</Paragraph> <Paragraph position="18"> This allows transposing the etements of a case frame containing two or more elements of the same type.</Paragraph> <Paragraph position="19"> In the event that the interface designer wishes to specify optional items, (s)he can either give multiple specifications or denote optionality within parentheses.</Paragraph> <Paragraph position="20"> Thus, the verb phrase specification (subj verb (obj) (part) (prep obj)) will expand into eight patterns having from two to six elements each.</Paragraph> <Paragraph position="21"> Since the interface designer will have specified slot names for each type of case frame, the system can easily detect the presence of &quot;noise&quot; words. In addition, small matters such as the equivalence of &quot;a&quot; and &quot;an&quot; must be taken care of, and the interface designer does this be a giving a translation map such as Noise-Translations = ((an a) (the a)) which instructs the system to make the indicted replacements in both an English-like specification to be matched and the internal patterns. It is not necessary that all noise words be present in the system dictionary. For example, &quot;can&quot; does not presently appear in the context of our question-answering applications.</Paragraph> </Section> <Section position="3" start_page="456" end_page="458" type="metho"> <SectionTitle> 6. A User's View of Case Frame Specification </SectionTitle> <Paragraph position="0"> There are two ways in which a user may designate which case frame information is of interest, namely (1) by menu, and (2) by English-like specification. The former is straightforward, while the latter is more convenient, and more interesting.</Paragraph> <Paragraph position="1"> In specification by menu, the user first indicates a phrase type to be inquired about, and is then instructed by the system to provide a filler for each slot in the associated case frames. For example, to find out what domain objects can be &quot;in&quot; a county, a user would make the selections indicated in Figure 2. Since our case frames allow both the head-noun and argument-ofpreposition slots to be filled with any basic object type of the domain at hand, the second and fourth menus contain the same options. The internal list that results from these specifications is essentially As suggested in Figure 2, during menu specification, the system considers in turn each case slot of the phrase type in question and, for each of them, presents to the user for selection a list of current fillers, along with an option to &quot;look at all&quot;. For slots whose filler type is either an open category, or a closed category having possible fillers not presently being listed, an option to select some &quot;other&quot; filler is included. Finally, for optional phrase elements (e.g. direct object of a vcrb), an option appears that allows the user to select &quot;nonc&quot;. In English-like spccification, the user typcs a phrase that indicates each desired slot value, not necessarily in the order they appear in internally.</Paragraph> <Paragraph position="2"> Appropriate noise words may appear, and &quot;?&quot; may be used as a &quot;wildcard&quot; to indicate an interest in all possible values. For example, the sample specification given above by menu could be indicated by &quot;a ? can be in a county&quot; As with menu specification, it is possible for the user to introduce new vocabulary. For instance, if the italicized items Were new in the specifications &quot;an employee can report to a manager&quot; &quot;an employee can be rerponsible for a project&quot; &quot;an employee can be the supervisor of a project&quot; the system would have sufficient information to find a unique match among the patterns stored. In these situations, the system will have automatically determined the part of speech of the new word.</Paragraph> <Paragraph position="3"> Although our use of &quot;?&quot; may seem artificial in the example above, when compared against a more fluent method of inquiry such as &quot;what can be in a county&quot;, it allows any case frame slot to be inspected, not just those slots that are filled with nominals. For example, a user might specify &quot;a city can be ? a county&quot; to find ali prepositions linking &quot;city&quot; with &quot;county&quot;, or &quot;an employee can ? a project ?&quot; to find all verb-particle pairs connecting &quot;employee&quot; with &quot;project&quot;. We prefer to provide a small number of simple and powerful mechanisms, even though other methods might appear preferable in some situations.</Paragraph> <Paragraph position="4"> For the readcr whose aesthetics clif\[cr from ours, we note that alternate phrasings can bc provided for by simple modifications to the algorithm given in Section 7. Unlike menu specification, English-like specification allows certain ambiguities to arise, especially when the system designer has chosen to permit terse forms with few or no noise words. For example, the respective absence of the noise words &quot;can&quot; and &quot;can be&quot; in the specifications &quot;employee responsible for project&quot; &quot;employee report to manager&quot; makes it impossible for the system to decidc whether the new word is an adjectivc or a verb. In such situations, the system constructs a suitable menu, which for the above specifications would be roughly What type of information are you (livinq? - t)erb Phra-~Pat't i ci e --gt&quot;dinarv 9e:rb Pt-,ras:e C/hdject.ive Phrase .................. -(F;F;7-,-s- ................</Paragraph> <Paragraph position="5"> In the other extreme, it is possible that none of the stored patterns match the user's specification, in which case the system requires the user either to paraphrase or to resort to menu specification.</Paragraph> <Paragraph position="6"> In our experience, English-like specification yields a unique match about 80 percent of the time; more than one match about 15 percent of the time; and no matches about 5 percent of the time. The most frequent situation in which a multiple match occurs concerns the possibility that a preposition appearing in a verb phrase is a particle. For example, if the user types &quot;an employee can pick a project up&quot; then &quot;up&quot; is known to be a particle by its position* If instead the user were to type &quot;an employee can pick up a project&quot; then the system will need to determine whether &quot;up&quot; is a particle. Although we generally aw)id yes-no questions, as discussed below, we decided to allow one in this frequent and predictable situation, as indicated by I Can an employee can work for a manager be paraphrased as an em~ e.V~e_can wot'k a m~(or --\[ -- &quot;( ~: :5, .......... ....................</Paragraph> <Paragraph position="7"> Finally, it is useful to allow the system to present the user with relevant information that the system knows it will need, rather than wait (and hope) for the user to offer it. As a first example, suppose the system has failed to parse the input &quot;Which corridor is Stumberger's office adjacent to'?&quot; and the user accepts tile system's offer to provide help in tracking down the problem. Since the word &quot;adjacent&quot; is an adjective, and adjectives are known to have phrases associated with them, the system will supply all current information about those adjective phrases having &quot;adjacent&quot; in the adj slot and leaving tile remaining slots unspecified. That is, the system will respond as though the user has specified 7. How English-Like Specifications are Processed When an English-Like Specification is received from the user, the system must (1) determine what phrase type is being dealt with; (2) detect any new words; and (3) account for any unspecified (wildcard) case slots. As an example, suppose a user wants to know what things can be &quot;associated with&quot; an employee, and suppose further that the word &quot;associated&quot; is not yet known to the system. In this case, the system will naturally know of nothing that can be &quot;associated with&quot; an employee, but will give the user an opportunity to add to its knowledge. If tile user were to type an employee can be associated with a ? this specification is first is scanned and turned into a employee can be ?? with a ? where &quot;??&quot; marks the position of all unknown word and &quot;'~&quot; continues to denote a wildcard slot. Note that (1) a &quot;noise translation&quot; from Section 5 has been used for &quot;an&quot;, and (2) the noise words &quot;can&quot; and &quot;be&quot; have not yet been removed, since they may act as content words in a pattern for something ()tiler than an adjective phrase. The next step is to substitute part-of-speech labels for each word in the partially processed specification. Only those parts of speech that tile system knows are relevant, as indicated by the information supplied by the interface designer as shown above, are included (e.g. %&quot; is not replaced by &quot;article&quot;). Thus, tile system converts tile structure shown above into a (noun entity) can be ?? (prep) a ? * at which point an attempt can be made to match tile internal patterns that represent the acceptable case frame specifications.</Paragraph> <Paragraph position="8"> The pattern matching that occurs at this point is simple, where ? matches any case slot ?? matches any &quot;open&quot; category case slot</Paragraph> <Paragraph position="10"> In particular, the single match found for the structure shown above is a entity can be adj prep a entity which is known to be associated with adjective phrases (since it was defined for that purpose).</Paragraph> <Paragraph position="11"> At this point, the intermediate structure containing the ?? marker is re-examined and compared with the original specification the user typed; the user is asked to confirm that &quot;associated&quot; is indeed a new adjective; and the lexicon update routine is invoked to insert &quot;associated&quot; into the lexicon as an adjective. Next, the system strips noise words and so the case frames to be examined are indicated by Finally, the system presents a l-dimensional mentt, similar to that shown below in Figurc 3b, which allows the user to specify what things an employee can be associated with.</Paragraph> <Paragraph position="12"> 8, Display of Relevant hfformation The formats that we have chosen for TELl to display the current case frame iuformation relevant to a user's specification are based on the desires I. to allow information to bc inspected and updated simultaneously, and 2. to minimize the number of specific menu t y/)es presented to the user.</Paragraph> <Paragraph position="13"> In particular, thc system constucts, whenever possible, a menu in which each possible setting of unspecified case frame values may independently be turned &quot;on&quot; or &quot;off&quot; by a mouse click. In the current implementation, &quot;whenever possible&quot; amounts to precisely those situations in which no more than two case frame slots are left unspecified. Thus, a menu will contain choice boxes which have from zero through two dimensions, according to the number of unspecified case slots. Examples appear in Figures 3a through 3c. Note that appropriate row and column labels, and also a suitable menu label, must be constructed by the system. Since the system has no initial domain-specific vocabulary, these menus must be formulated at run-time.</Paragraph> <Paragraph position="14"> When more than two case slots are unspecified, the system simply prints all existing case frames that satisfy the indicated constraints, supplying an initially filled box for each, as indicated in Figure 3d. This allows the user to remove individual case frames, and the &quot;Add&quot; oplion allows information to be added. Although we have chosen to avoid asking literal yes-no questions whenever possible, largely because of the low information content they provide, the choicebox scheme we have adopted implicitly asks a number of simultaneous yes-no questions. Thus, when the user checks the box in a menu for the preposition &quot;with&quot; having City as row label and County as column label, (s)he is in effect answering &quot;yes&quot; to the implicit question &quot;can a city be in a county&quot;.</Paragraph> </Section> <Section position="4" start_page="458" end_page="459" type="metho"> <SectionTitle> 9. Discussion </SectionTitle> <Paragraph position="0"> We now consider (a) treatment of single-word modifiers, (b) phmned enhancements to case frame capabilities, and (c) related acquisition modules.</Paragraph> <Paragraph position="1"> i t i c,,,.~ 1D Pl-,r ..... E M F'L,-,',' Upd.dt_ e EE IltO DE F'FIR 1 r'IEItT \[\] !11 Number of Queried Slots The techniques presented in this paper, ~hich arc directed toward case frames for multiple-word phrases, are actually used for single-word modifiers as well. Internally, one important difference is that the associated modifier compatibility information is maintained in the lexicon rather than stored into auxiliary case frames. A'~ an example, if the user says that the word &quot;large&quot; can modif 3 obiccts of type Department and Office, onc associated lcxical entry is (larger compar large (nt department office)) As with case lramcs, the user may impart compadbilit> information for single-word modifiers by either menu or English-like specification. Fhe latter is typified by a room can be large while an example of how the user may ask to see everything known about acceptable adjective modifications is shown in Figure 4.</Paragraph> <Paragraph position="2"> Several enhancements to our facilities for Englishqike capabilitites are planned. For instance, we noted in section 6 that whereas the use of &quot;?&quot; to denote an unspecified slot works for all parts of speech, it might be more natural to denote unspecified norms by &quot;w\[aat&quot; and possibly transpose the specification accordingly. As noted previously, the question is one of generality versus naturalness in specific situations; simple modifications to the algorithm given in the preceding section would enable alternate forms. We are considering whether to alter our methods of inquiry, perhaps to provide for both forms. Another enhancement being considered is to permit inflected forms, as indicated by the italicized elements of students can be failed by an instructor Finally, we wish to give some feeling for the lexical and semantic acquisition facilities alluded to in the paper. Figure 4a gives the top-level menu pertaining to part-of-speech information. This menu enables the user to obtain output which as with case frame information allows simultaneous inspection and modification, as illustrated in Figure 4c. Word and phrase meanings are acquired similarly, and also involve either menu or English-like specification. As an example of the latter, if the user has said that an employee can work with an employee then the system will ask what &quot;work with&quot; means in this sense by selecting two example employees in terms of which the user is asked to define semantics. For example, the system will in effect ask What does it mean for Bob to work with Jill? at which point the user might say the dept of Bob is equal to the dept of Jill</Paragraph> </Section> class="xml-element"></Paper>