File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/a88-1021_metho.xml
Size: 27,009 bytes
Last Modified: 2025-10-06 14:12:05
<?xml version="1.0" standalone="yes"?> <Paper uid="A88-1021"> <Title>DICTIONARY TEXT ENTRIES AS A SOURCE OF KNOWLEDGE FOR SYNTACTIC AND OTHER DISAMBIGUATIONS</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 0. INTRODUCrION </SectionTitle> <Paragraph position="0"> Online reference books may be thought of as knowledge bases, with data structures encoded in natural language. We have developed a system that reasons heuristically about the comparative likelihood of various potential attachments for prepositional phrases in English sentences by analyzing relevant definitions in Webster's online dictionary (W7) in their original text form (Binot and Jensen 1987, Jensen and Binot forthcoming).</Paragraph> <Paragraph position="1"> This paper reviews that earlier work and then extends it by suggesting how additional information (particularly example .,entences from another dictionary, the Longman Dictionary of Contemporary English (I,DOCE)) might be used to cope with throe additional problems: the attachment of relative clauses, the resolution of some cases of pronoun reference, and the interpretation of dangling modifiers. The earlier work on PP attachments has been implemented, but we have only begun work on the implementation of these additional disambiguation problems.</Paragraph> <Paragraph position="2"> Nevertheless, it seems like a good idea to indicate that this dictionary-based approach should be feasible for mora than PP attachments.</Paragraph> <Paragraph position="3"> Our objective is to consult the dictionary to find the kind of information that has previously been supplied by means of scripts, frames, templates, and other hand-crafted devices. This approach offers hope for reducing time-consuming, and usually incomplete, hand-codings of semantic information: and it should be of particular interest for non-restricted text processing applications such as machine translation and critiquing. null We are concerned here with emulating, in some sense, the way a person uses a dictionary: look up one entry, study the defmitions and the examples, look up other entries, and so on. We feel that natural language itself can be a reasonable knowledge representation language. More needs to be learned about how to access and manipulate this knowledge; but the flexibility afforded by natural language is an advantage for the task, not a drawback.</Paragraph> <Paragraph position="4"> This research is related to other work being done with machine-readable dictionaries, e.g.</Paragraph> <Paragraph position="5"> Markowitz et al. 1986, in the sense that we all share the goal of automatically extracting semantic information from these rich sources.</Paragraph> <Paragraph position="6"> l lowever, in other respects our approaches are quite different.</Paragraph> </Section> <Section position="4" start_page="0" end_page="153" type="metho"> <SectionTitle> I. A'VI'ACIIMENT OF PREPOSITIONAL </SectionTitle> <Paragraph position="0"> PIIRASF^~; The relationships in which we am interested can be illustraled by the following sentences from Binot Iq85: (!) I ate a fish with a fork.</Paragraph> <Paragraph position="1"> (2) l ate a fish with bones.</Paragraph> <Paragraph position="2"> (See Appendix A, Tree I.) In both cases, the ambiguity resides in tile placement of the &quot;with&quot; prepositional phrase, which might modify either &quot;fish&quot; or &quot;ate'. The parse tree shows the PP attached to the closest possible head, &quot;fish,&quot; with a question mark showing that it could alternatively be attached to the verb &quot;ate ~.</Paragraph> <Paragraph position="3"> Focussing on (1), another way to phrase the key question is &quot;Is it more likely that a fork is associated with a fish or with an act of eating?&quot; To answer that question, the system evaluates separately the plausibility of the two proposed constructs: (la) eat with a fork (lb) a fish with a fork then orders the solutions, and picks the one with the highest rating.</Paragraph> <Paragraph position="4"> In the heuristics we are currently using, the basic way to rate the likelihood of a construct is to try to establish, through the dictionary, some relevant semantic connection between the words of that construct. Easier (or shorter) connections yield better ratings. Long connections, or connections making use of approximate inferences, will lead to lower ratings. For example, the definition of &quot;fork&quot; contains the phrase &quot;used for taking up,&quot; and &quot;eating&quot; is defined as a kind of &quot;taking&quot; in the dictionary. By establishing these relationships, we see a plausible semantic connection between &quot;fork&quot; and &quot;eat,&quot; and (la) receives a high rating.</Paragraph> <Paragraph position="5"> The relationships are established (aS by identifying equivalent function-word patterns in the definitions, such as the equivalence of &quot;used for&quot; and the instrumental &quot;with'; (b) by linking important definition words (i.e., central terms in definitional phrases, such as heads of phrases, or else synonyms). This is done by parsing the defruitions, identifying the central word(s), and then following hierarchical chains of definitions through the dictionary.</Paragraph> <Paragraph position="6"> Heuristic answers are expressed in terms of certainty factors which, as in the MYCIN system (Shortliffe 1976), take their values in the range (-1,+ 15: &quot;-I&quot; expresses absolute disbelief; &quot;0&quot; expresses complete uncertainty; &quot;1&quot; expresses absolute befief. Intermediate values express varying degrees of belief or disbelief.</Paragraph> <Paragraph position="7"> The two main heuristics that are used to evaluate the plausability of (la) against (Ib) can be described in English as follows: HI- for checking for an INSTRUMENT relation between a head and a &quot;with&quot; complement: I. if the head is not a verb, the relation doesn't hold (certainty factor = -15; 2. if some &quot;instrument pattern&quot; (see below) exists in the dictionary def'mition of the complement, and if this pattern points to a defining term that can be linked with the head, then the relation probably holds (certainty factor = 0.7); 3. else assume that there is more chance that the relation doesn't hold (certainty factor = -0.35.</Paragraph> <Paragraph position="8"> checking for a PARTOF relation between a head and a &quot;with&quot; complement: I. if the head is not a noun, the relation doesn't hold (certainty factor = - 15; 2. if some &quot;part-of pattern&quot; (see below) exists in the dictionary definition of the complement, and if this pattern points to a defining term that can be linked with the head, then the PARTOF relation probably holds (certainty factor = 0.7); 3. else assume that there is more chance that the relation doesn't hold (certainty factor = -0.35.</Paragraph> <Paragraph position="9"> H2- for Each certainty factor refers to the specific proposition (or goal) to which the heuristic is applied. Thus, if clause 3 of heuristic 112 is used when applied to the proposition (lb), the resulting certainty factor -0.3 will indicate a relatively moderate disbelief in this proposition, stemming from the fact that the system has not been able to find any positive evidence in the dictionary to sustain it.</Paragraph> <Paragraph position="10"> The above heuristics make use of the fact that there are specific words and/or phrases in dictionary definitions, forming patterrt~, which are almost systematically used to express specific semantic relations (Markowitz et ai. 19865. For the two relations considered here, some of these patterns are: INS'I'RUMI:.NT: for, used for, used to, a means for, etc.</Paragraph> <Paragraph position="11"> PARTOF: part of, arises from, end of, member of, etc.</Paragraph> <Paragraph position="12"> These patterns generally take, as their objects, some cen',rai term (or terms) in the definition of the complement word. We can then try to link that term with the head of the construct that is being studied.</Paragraph> <Paragraph position="13"> Focussing again on example sentence (15, the syslem starts by examining the first construct, (la). It parses the definition of the complement &quot;fork,&quot; and discovers at least one INSTRUMENT pattern, &quot;used for': fork: An implement with two or more pron~ used esp for taking up (as in eating), pitching or digging.</Paragraph> <Paragraph position="14"> Taking the conjunction into account, the system finds three possible terms: &quot;taking up,&quot; &quot;pitching,&quot; and &quot;digging,&quot; which it tries to link with &quot;eat.&quot; (For the present, we deliberately avoid the phrase &quot;as in eating&quot; -- which offers a direct match -- in order to show that our approach does not rely on such lucky coincidences.) The system is able to establish that &quot;eat&quot; is a direct hyponym of &quot;take&quot; according to W7: eat: to take in through the mouth as food...</Paragraph> <Paragraph position="15"> to take food or a meal.</Paragraph> <Paragraph position="16"> The link is thus probably established, and the system moves on to consider (lb). Since no PARTOF pattern can be found in the definitions of &quot;fork,&quot; this second possible construct will be ranked as much less likely -- (la) receives a certainty factor of +0.7, but (Ib) gets a certainty factor of only -0.3. Therefore the system recommends attaching the PP to the main verb in (!). For sentence (2), the constructs to be compared are &quot;eat with bones&quot; and &quot;a fish with bones.&quot; In the definition of &quot;bone,&quot; no useful INSTRUMENT pattern is found; so &quot;eat with bones&quot; cannot be easily validated. But the first definition of &quot;bone&quot; gives the following PARTOF pattern: bone: One of the hard parts of the skeleton of a vertebrate.</Paragraph> <Paragraph position="17"> This yields two possible links for &quot;fish': %keleton&quot; and &quot;vertebrate.&quot; &quot;Fish&quot; can be identiffed as a direct hyponym of &quot;vertebrate&quot; according to W7.</Paragraph> <Paragraph position="18"> fish: Any of numerous cold-blooded strictly aquatic craniate vertebrates...</Paragraph> <Paragraph position="19"> Therefore, &quot;a fish with bones&quot; receives a higher certainty factor than &quot;eat with bones,&quot; and the system recommends attaching the prepositional phrase to the direct object in sentence (2).</Paragraph> <Paragraph position="20"> The above examples are among the simplest.</Paragraph> <Paragraph position="21"> In more difficult cases, heuristics may perform various kinds of inferences in order to establish connections. It is also possible for several heuristics to be applied to a given construct, with their results then being combined. The cumulative effect of many heuristics, and not the perfection of each one separately, does the job. The choice of certainty factors rests mainly on intuition. Some choices are easy; some inferences, for example, are obviously weaker than others. In other cases the values have to be adjusted by trial and error, by processing many examples. It is interesting to note that, as our corpus of examples increa~s, the certainty factors are converging toward apparently stable values. Our system currently includes about 20 heuristic rules and is able to handle the prepositions &quot;with,&quot; &quot;by,&quot; &quot;after,&quot; and &quot;in.&quot; It has been tested successfully on about 50 examples so far.</Paragraph> </Section> <Section position="5" start_page="153" end_page="155" type="metho"> <SectionTitle> 2. ATFACItMENT OF RELATIVE CLAUSES </SectionTitle> <Paragraph position="0"> A typical problem in attaching relative clauses occurs when the clause is separated from the noun it modifies by a prepositional phrase: (3) I want the book by my uncle that is on the shelf.</Paragraph> <Paragraph position="1"> In (3), the relative clause &quot;that is on the shelf&quot; probably modiffcs &quot;book&quot; and not &quot;uncle.&quot; A human reader assumes this because of knowing that a book is more likely to be on a shelf than an uncle is. I Iowcver, syntax alone cannot tell us so. A syntactic parser will normally produce a trec which shows the relative clau~ modifying the closest noun, namely &quot;uncle. ~ (See Appendix A, Tree 2.) Note that the parser attaches the relative clause (RELCI.) node arbitrarily to the closest head noun &quot;uncle,&quot; but marks the other possible attachment site (&quot;book ~) with a question mark. The higher question mark in Tree 2 is for the PP attachment.</Paragraph> <Paragraph position="2"> The grammar that supports all ol&quot; the parsing discussed here is the PI.NLP ~nglish Grammar (Jensen in preparation, Heidorn 1976).</Paragraph> <Paragraph position="3"> We have implemented the solution to this kind of relative clause ambiguity. Our system starts by trying to solve the PP attachment problem: does &quot;by my uncle&quot; modify &quot;book&quot; or &quot;want'?. Of all possible relationships between the various word pain, the AUTHOR relationship between &quot;book&quot; and &quot;uncle&quot; will receive by far the best ranking. This will happen because it can be established, by using the dictionary, that an uncle can be a human being (and thus able to author a work), and that a book is some kind of work.</Paragraph> <Paragraph position="4"> The processing of the RELCL attachment then begins. Syntax tells us that the relative pronoun &quot;that&quot; is the subject of the predicate &quot;be on the shelf.&quot; One of the properties of the verb &quot;to be&quot; is that a prepositional complement qualifying this verb really qualifies the subject of the verb. Applied to Tree 2 of Appendix A, this &quot; provides two possible interpretations: book on the shelf uncle on the shelf At this point we can ~ that the relative clause attachment in Tree 2 reduces to a prepositional phrase attachment, which can be solved easily by the PP attachment methods already described.</Paragraph> <Paragraph position="5"> Specifically, the dictionary defudtion for &quot;shelf&quot; will tell us that a shelf is &quot;to hold objects&quot; or &quot;for placing things on,&quot; and the word &quot;book&quot; can be related to &quot;object&quot; or &quot;thing&quot; much more easily than the word &quot;uncle&quot; can be so related. This will lead to the preference for -book&quot; as the antecedent of the relative clause.</Paragraph> <Paragraph position="6"> However, most relative clause attachment problems cannot be reduced to PP attachments.</Paragraph> <Paragraph position="7"> Consider (4): (4) l know the actor in the movie that you met last month.</Paragraph> <Paragraph position="8"> The parse tree for this sentence (Tree 3 of Appendix A) shows question marks in the same positions as Tree 2. floweret, because of the syntactic structure of the RELCL in (4), we know that the relative pronoun this time refer~ to the obiect. ~ its main verb &quot;met.&quot; Either &quot;movie&quot; or &quot;actdr&quot; must be the object of &quot;met.&quot; No prepositional phrase is involved.</Paragraph> <Paragraph position="9"> Now we have to decide which is more likely: You met an actor.</Paragraph> <Paragraph position="10"> You met a movie.</Paragraph> <Paragraph position="11"> Although semantic codes are included in the on-line version of LDOCE (i.e., features like HU-MAN and ABSTRACT are marked on nouns, and subcategoriz~ation codes using these features are marked on verbs), the codes do not help with problems like this one. According to the LDOCE codes, possible objects for the simple transitive verb &quot;meet,&quot; in its various sub-senses, are IIUMAN, ABSTRACT, and (moveable) SOLID. No ranking of likelihood or preference is given, and of course a syntactic parser would not know which sub-sense it is dealing with.</Paragraph> <Paragraph position="12"> &quot;Actor&quot; is marked + tlUMAN, and &quot;movie&quot; is marked + ABSTRACT. So either object noun is equally likely (Mary Neff, personal communication). null Although we have not yet implemented this, we believe that the same &quot;approximate reasoning&quot; that we implemented for PP-attachments will work here, too. The strategy is to formulate heuristics that yield &quot;certainty factors,&quot; not categorial acceptance or rejection of an interpretation. These heuristics would propose a solution for the stated task by operating on the output of the syntactic parser. For the current example, the first step would be to parse the LDOCE entry for &quot;meet&quot; (shown in Figure I), looking for direct objects.</Paragraph> <Paragraph position="13"> meet' /mi:t/ p met /mat/ 1 \[TI.IO! to ~m8 tolleth~ (with), by chancal oC/ anlapmeut: /dr# /or ~mm~Jt_...mtt h/m m the str,~ --coml~ WITH Z \[TI J to find oC/ ezponm1~; ~In : I met a lot of diff~dtwj m the m~k 3 \[ ! J\] to come together or clme: The cars a/man mn ( --oBe front aSainslt me o01~), bwr dtt, w awayanddea~aw 4 \[TI .il)\] to |et to kJ~OW or I~ inttoduC/lxl (to) foe the first time: Come to thepm.ty and m~.t ~ mtc~rxtmg pcop@.j We mat at Anm'l py.n.y, a'~t w. ~q. i don't r,,mm~ ~ ,~ | \[lllJ to ,,ola at a rasltentn 8 point: .W~ skirt ~'t meet round m y mtdd/G * {\[OI to gather tolether:</Paragraph> <Paragraph position="15"> to amtwer, etp. i- oppomtion: Hit cha~eJ wertraa w,k cr~, of a.'~er.lAnlrry cry# mtt kU xpceck O \[TI \] to be them at the zmvsi of: rll ~ .vow o\[/ the tmm.lThe tart wdl n~tt the teamtwdl ~ you off&quot; the team 10 \[TI\] to pay: Can you ~ 0m ~? 11 \[TI\] to sattdy: Does this meet mr tmpes?lThit nm~ road meets a Ionf./elt need 12 ..k. ,.,a. m~t to u,,' one's (small amount o(1 monC/~ caR(lilly 50 a to all'oN Whlll nile flesh;Is 13 meeC/ tmntmto's eye also look ~ Im dm eye-to look dill~tly or Steadily &t 5omeone 14</Paragraph> <Paragraph position="17"> The sub-definitions are no help, because no objects are shown. But the example sentences in the entry are a rich source of information about typical usage. There are eleven different example object nouns: him, lot (of difficulties), people, face, speech, you (twice), train, amount, hopes, need. Over a third of them can be easily related to the word &quot;actor': the word &quot;people,&quot; and the three occurrences of personal pronouns. (The general rule here is that any personal pronoun except &quot;it&quot; can be substituted for any word that has &quot;person&quot; as the head of one of its def'mitions.) None of them can be so easily related to the word &quot;movie.* Thus the system concludes that &quot;actor&quot; is a more likely object of the verb &quot;meet&quot; than is &quot;movie.&quot; This conclusion is no accident; lexicographers are experts on words, and they have incorporated their expertise, in ways both obvious and subtle, into standard dictionaries.</Paragraph> <Paragraph position="18"> Mother interesting example of the relative clause attachment problem is found in the following sentence from a large data base of business letters: (5) There are no agencies within the country which would loan money to individuals for establishment of boarding homes.</Paragraph> <Paragraph position="19"> The choice here is between possible nouns to serve as the subject of the predicate &quot;would loan money': Agencies would loan money.</Paragraph> <Paragraph position="20"> Country Would loan money.</Paragraph> <Paragraph position="21"> First, the LDOCE definition for &quot;loan&quot; refers us to the word &quot;lend.&quot; Moving to the entry for &quot;lend,&quot; we look for cited subiects. The example sentences, in this case, are no help: subject words arc either personal pronouns or the word &quot;flags&quot;; and none of these helps us to choose between &quot;agencies&quot; and &quot;country.&quot; But one of the sub-definitions of &quot;lend&quot; is &quot;to give out (money) for profit, esp. as a business'.</Paragraph> <Paragraph position="22"> The phrase &quot;as ?a&quot; is often used in definitions to signal the AGENT that does the action. Then we consult the dictionary to see which better qualifies as a &quot;business': &quot;agency&quot; or &quot;country.&quot; The answer comes easily; the first sub-definition of &quot;agency&quot; is &quot;a bzL~iness that makes its money esp...&quot;. The two words &quot;country&quot; and &quot;business&quot; cannot be connected so easily as &quot;agency&quot; and &quot;business&quot; along any path of heuristic searching. Therefore we prefer to attach the relative clause to &quot;agencies&quot; rather than to &quot;country.&quot; It is important to realize that none of the information being cited here is manually coded; the English text of the LDOCE entries is being used. Our strategy can be considered to be making explicit a semantic network that exists implicitly in this text. The entry for -lend&quot; shows &quot;business&quot; as an AGENT of &quot;lending'; the entry for &quot;agency&quot; shows that &quot;agency&quot; is a kind of</Paragraph> </Section> <Section position="6" start_page="155" end_page="157" type="metho"> <SectionTitle> 3. RESOLUTION OF PRONOUN REFER- ENCE </SectionTitle> <Paragraph position="0"> Problems of pronoun reference are many and varied, and not all of them will yield to this same method of solution (llobbs 1986, Sidner 1986).</Paragraph> <Paragraph position="1"> But for some, the information in dictionary definitions can give important clues. Consider (6) and (7): (6) We bought the boys apples because they were hungry.</Paragraph> <Paragraph position="2"> (7) We bought the boys apples because they were cheap.</Paragraph> <Paragraph position="3"> In the absence of other information, human readers assume that &quot;they&quot; probably refers to the boys in (6) and to the apples in (7). The computer needs to follow some inference path that will lead to the same tentative assumptions. For sentence (6), we need to choose a most likely subject noun for the predicate &quot;be hungry&quot; -- either: Boys were hungry.</Paragraph> <Paragraph position="4"> Apples were hungry.</Paragraph> <Paragraph position="5"> We would first parse the dictionary defmition for * hungry.&quot; In LDOCE, there are two example sentences with personal pronouns for subjects; the word &quot;boys&quot; can be quickly related to all personal pronouns. There are no example sentences with subjects that can be easily related to &quot;apples.&quot; Additional support can be found in two directions. The first definition for &quot;hungry&quot; in LDOCE is &quot;feeling or showing hunger.&quot; We want to find out what sort of entity can &quot;be hungry,&quot; so we ask what sort of entity can &quot;feel.&quot; Of about 30 example sentences for the verb &quot;feel,&quot; 26 are personal pronouns (excluding &quot;it'). Hence we prefer &quot;boys&quot; to &quot;apples&quot; as the subject of &quot;be hungry.&quot; A second direction of search also reinforces this interpretation. &quot;Hungry&quot; is defined as &quot;feeling or showing hunger,&quot; and &quot;hunger&quot; is defined as &quot;the wish or need for food.&quot; Briefly summarized, we conclude that &quot;food&quot; is the object (or goal) of hunger, hence of being hungry. LDOCE also tells us that an &quot;apple&quot; is &quot;a hard round fruit&quot; and &quot;fruit&quot; is &quot;used for food.&quot; ttence apples are (used for) food; hence apples can be the object of q~'ing hungry.&quot; Since the mggested object of &quot;being hungry&quot; is the same as the object of the main clause (see (6)), it stands to reason that &quot;they&quot; probably does no( also refer to &quot;apples.&quot; The paths that we are tracing are delicate, but they exist. A computer program that follows these paths extracts, from existing text, some very interesting real-world relationships.</Paragraph> <Paragraph position="6"> In solving the pronoun reference task of sentence (7), the program must choose between: Boys were cheap.</Paragraph> <Paragraph position="7"> Apples were cheap.</Paragraph> <Paragraph position="8"> By following paths through the LDOCE entries, the conclusion that &quot;apples were cheap* appears more likely than that &quot;boys were cheap&quot; (although the latter is certainly possible).</Paragraph> </Section> <Section position="7" start_page="157" end_page="157" type="metho"> <SectionTitle> 4. INTERPRETATION OF DANGLING MODIFIERS </SectionTitle> <Paragraph position="0"> English teachers have long objected to a potential awkwardness and lack of clarity in constructions with dangling modifiers: (8) (While) watching TV, the doorbell rang.</Paragraph> <Paragraph position="1"> In .sentences like (8), the attachment problem appears in a different guise. There is only one noun given for the participial to modify, and that is &quot;doorbell.&quot; It is not possible to set up an obvious choice pair in the same manner as before.</Paragraph> <Paragraph position="2"> However, we do know that participial modifiers are a notorious source of confusion. So we can check the dictionary to find out how likely it is that a doorbell might watch &quot;IV.</Paragraph> <Paragraph position="3"> In LDOCE, the sub-definitions for &quot;watch&quot; are no help. But the example sentences, once again, offer strong hints. There are 16 such examples. Fifteen of them have personal pronouns as subjects for the verb &quot;watch.&quot; The first example is &quot;Do you often watch TV?&quot; (This situation was not contrived; sentence (6) was taken from a popular high school English grammar book, Warriner 1963, before the dictionary was consulted.) With this information in hand, we can say that &quot;doorbell&quot; is, at best, an unlikely subject for the verb &quot;watch.&quot;</Paragraph> </Section> class="xml-element"></Paper>