File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/e93-1051_metho.xml
Size: 15,413 bytes
Last Modified: 2025-10-06 14:13:20
<?xml version="1.0" standalone="yes"?> <Paper uid="E93-1051"> <Title>Lexical Disambiguation Using Constraint Handling In Prolog (CHIP) *</Title> <Section position="2" start_page="0" end_page="431" type="metho"> <SectionTitle> 2 Background: LDOCE, Word Sense </SectionTitle> <Paragraph position="0"> Disambiguation and related work LDOCE's important feature is that its definitions (and examples) are written in a controlled vocabulary of 2187 words. A definition is therefore always written in simpler terms than the word it describes. These 2187 words effectively constitute semantic primitives, and any particular word-sense is defined by a set of these primitives.</Paragraph> <Paragraph position="1"> Several researchers have been experimented with lexical disambiguation using MRDs, including \[Lesk, 1986; Wilks et al., 1989; McDonald et al., 1990; Veronis and Ide, 1990; Guthrie et al., 1991; Guthrie et al., 1992\]. Lesk's technique decides the correct sense of a word by counting the overlap between a dictionary sense definition (of the word to be disambiguated) and the definitions of the nearby words in the phrase. Performance (using brief experimentation) was reported 50-70% and the results *This work was supported by the Greek Employment Manpower Organisation (OAED), Ministry of Labour, as part of an 1991-93 scholarship scheme.</Paragraph> <Paragraph position="2"> were roughly comparable between Webster's 7th Collegiate, Collins English Dictionary and Oxford Advanced Learner's Dictionary of Current English.</Paragraph> <Paragraph position="3"> Methods based on co-occurence statistics have been used by \[Wilks et al., 1989; McDonald et ai., 1990; Guthrie et al., 1991\]. By co-occurence is meant the preference two words appear together in the same context. \[Wilks ctal., 1989\] computed lexical neighbourhoods for all the words of the controlled vocabulary of LDOCE. This neighbourhood information is used for partitioning the words according to the senses they correspond to in order to make a classification of the senses. Their results for using occurences of the word bank were about 53% for the classification of each instance into one of the thirteen sense definitions of LDOCE and 85-90% into one of the more general coarse meanings. Neighbourhoods were used by \[McDonald et al., 1990\] for expanding the word sense definitions. The union of neighbourhoods is then intersected with the local context and the largest overlap gives the most likely sense. A similar technique is used by \[Guthrie et al., 1991\] except that they define neighbourhoods according to sub-ject categories (i.e engineering, economic etc.) based on the subject code markings of the on-line version of LDOCE.</Paragraph> <Paragraph position="4"> Closer to the work we describe in this paper is \[Guthrie et al., 1992\]'s. They try to deal with large-scale text data disambiguation problems. Their method is based on the idea that the correct meaning of a complete phrase should be extracted by concurrent evaluation of sets of senses for the words to be disambiguated. They count the overlap between sense definitions of the words of the sentence as they appear in the on-line version of LDOCE. The problem is that the number of sense combinations increases rapidly if the sentence contains ambiguous words having a considerable number of sense definitions in LDOCE (say that word A has X different senses in LDOCE, B has Y and C has Z, then the number of possible sense combinations of the phrase ABC is X*Y*Z, e.g if X=Y=Z=10 sense definitions for each word then we have 1000 possible sense combinations). Simulated annealing is used by \[Guthrie et al., 1992\] to reduce the search space and find an optimal (or near-optimal) solution without generating and evaluating all possible solutions, or pruning the search space and testing a well-defined subspace of reasonable candidate solutions. The success of their algorithm is reported 47% at sense level and 72% at homograph level using 50 example sentences from LDOCE.</Paragraph> </Section> <Section position="3" start_page="431" end_page="431" type="metho"> <SectionTitle> 3 CHIP: Constraint Handling In </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="431" end_page="431" type="sub_section"> <SectionTitle> Prolog </SectionTitle> <Paragraph position="0"> We decided it was worthwhile investigating the use of a constraint handling language so that we could exhaustively search the space by applying CHIP's optimisation procedures. A CHIP compiler is available from International Computers Limited (ICL) as part of its DecisionPower prolog-based toolkit 1. CHIP extends usual Prolog-like logic programming by introducing three new computation domains of finite restricted terms, boolean terms and linear rational terms. Another feature offered by CHIP is the demon constructs used for user-defined constraints to implement the local propagation. For each of them CHIP uses specialised constraint solving techniques: consistency techniques for finite domains, equation solving in Boolean algebra, and a symbolic simplex-like algorithm. CHIP's declarations are used to define the domain of variables or to choose one of the specialised unification algorithms; they can be: (1) finite domains (i.e. variables range over finite domains and terms are constructed from natural numbers, domain variables over natural numbers and operators); (2) boolean declarations or (3) demon declarations (for specifying a data-driven behaviour; they consist of a set of rules which describe how a constraint can be satisfied). In addition, classes of built-in predicates over finite domain variables exist for: (1) arithmetic and symbolic constraints (basic constraints for domain variables), (2) choice predicates (help making choices), (3) higher order predicates (providing optimisation methods for combinatorial problems using depth-first and branch and bound strategies) and (4) extra-logical predicates (for help in debugging processes). Forward checking and looking ahead inference rules are introduced for the control mechanism in the computation of constraints using finite domains. Auxiliary predicates to monitor or control the resolution process in the CHIP environment also exist.</Paragraph> <Paragraph position="1"> In our case we were particularly interested in transforming the general structure of our algorithm into a form usable by CHIP's choice and higher order built-in predicates. Choice predicates are used for the automatic generation of word-sense combinations and higher order predicates facilitate the process of finding the most likely combination according to the 'score' of overlap. To get an idea of this kind of implementation the main core of the optimisation part of our program looks like this: opt imize (Words, Choice, Cost) : min:i~aize ( (makeChoice (Choice), predicates. Words represents the list of ambiguous words submitted to the program and Choice a list of domain variables for the selection of sense definitions. Cost is a domain variable whose domain is constrained to an arithmetic term. For our purposes, Cost was Max-0verlap where Max (a maximum possible score) is large enough so that Overlap (score of overlap in a sense definition) can never exceed it. Any answer substitution that causes (makeChoice(Choice), findCost(Choice,Cost)) to be ground also causes Cost to be ground. The search then backtracks to the last choice point and continuous along another branch. The cost of any other solution found in the sub-tree must be necessarily lower (i.e Overlap must be higher) than the last one found, because Cost is constrained to that bound. This process of backtracking for better solutions and imposing constraints on Cost continues until the space has been searched implicitly. At the end, (makeChoice(Choice), findCost(Choice,Cost) is bound to the last solution found which is the optimal one.</Paragraph> </Section> </Section> <Section position="4" start_page="431" end_page="433" type="metho"> <SectionTitle> 4 Algorithm </SectionTitle> <Paragraph position="0"> Our method is based on the overlap between sense definitions of the words to be disambiguated. This' is similar to \[Guthrie et hi., 1992\] although there are distinct differences on the scoring method and the implementation. To illustrate our method we use the following example and describe each phase: The bank arranged for an overdraft on my acco~lt.</Paragraph> <Section position="1" start_page="431" end_page="432" type="sub_section"> <SectionTitle> 4.1 Step 1 </SectionTitle> <Paragraph position="0"> All the common function words (particles) belonging to our 'stop list' (a set of 38 very common words) e.g. for our example the set of words (the, for, an, on, my) should be removed. Function words tend to appear very often both in context and in sense definitions for syntactic and style reasons rather than pure semantics. Since our algorithm is intended to maximise overlap the participation of function words in a definition chain could lead to false interpretation for the correct sense combination. Moreover, function words are usually much more ambiguous than content words (for example, there are 21 listed senses of the word the and 35 of for in LDOCE).</Paragraph> <Paragraph position="1"> Thus, the searching process could be significantly increased without any obvious benefit to the resolution of ambiguity of context words as explained above. Words of the 'stop list' have also been removed from the sense definitions and the remaining words are stemmed so that only their roots appear in the definition. With this way, derived (or inflected) forms of the same word can be matched together.</Paragraph> <Paragraph position="2"> For this reason, the program also uses the primitive or root forms of the input words. After functionword-deletion the program is given the following set of words: bank arrange overdraft account These are processed according to their stemmed sense definitions in LDOCE, represented as Prolog database structures such as: ba~k ( \[ \[bank, land, along, side, river, lake\], \[bank, earth, heap, field, garden, make, border, division\], \[bank, mass, snow, cloud, mud\], \[bank, slope, make, bend, road, race, track, safer, car, go, round\], \[bank, sandbank\], \[bank, car, aircraft, move, side, higher, other, make, turn\], \[bank, row, oar, ancient, boat, key, typewriter\], \[bank, place, money, keep, pay, demand, relate, activity, go\], \[bank, place, something, hold, ready, use, organic, product, human, origin, medical, use\], \[bank, person, keep, supply, money, piece, payment, use, game, chance\], \[bank, win, money, game, chance\], \[bank, put, keep, money, bank\], \[keep, money, state, bank\]J).</Paragraph> <Paragraph position="3"> The conventions we use are: a) Each word to be disambiguated is the functor of a predicate, containing a list with stemmed sense definitions (in lists). b) We do not put a subject code in each sense definition (as \[Guthrie et al., 1992\] do). Instead we put the word to be disambiguated as a member of the list of each sense definition. The rationale behind this is that although a word put in its sense definition cannot help with the disambiguation of itself, it can provide help in the disambiguation of the other words if it appears in their sense definitions, c) Compound words of the form 'race-track' were used as two words 'race' and 'track'.</Paragraph> </Section> <Section position="2" start_page="432" end_page="432" type="sub_section"> <SectionTitle> 4.2 Step 2 </SectionTitle> <Paragraph position="0"> The algorithm generates sense combinations by going through the sense definitions for each word one by one. For example, a sense combination can be called by taking the 8th sense of bank (call it b8, see above), the first sense of arrange (al=\[arrange, set, good, please, order\]), the definition of overdraft (ol-\[overdraft, sum, lend, person, bank, more, money, have, bank\]), and the seventh of account (cT=\[accouat, sum, money, keep, bank, add, take\]).</Paragraph> <Paragraph position="1"> The scoring process for this sense combination is given by taking the definitions pairwise and counting the overlap of words between them. Before the program proceeds to counting, redundancy of words is eliminated in each sense definition in order to prevent each word from being counted more than once.</Paragraph> <Paragraph position="2"> The algorithm checks for word overlap in advance and in case this constraint is not satisfied, the combination is discarded and a new one generated so that only overlapping combinations are considered.</Paragraph> <Paragraph position="3"> For each combination the total score is the sum of all the overlaps pairwise. This means that for n ambiguous words in the sentence the program counts the overlap for all n//(~/(n-2)/) pair combinations and add them together. For the above example,</Paragraph> <Paragraph position="5"> This scoring method is quite different to the one used by \[Lesk, 1986\]. Lesk simply counted overlaps by comparing each sense definition of a word with all the sense definitions of the other words. \[Guthrie et al., 1992\] use a similar method. It is different in that if there is a subject (pragmatic) code for a sense definition they put this subject code as a single word in the definition list. Then they go through each list of the words, put the word in an array and begin a counter at 0. If the word is already in the list they increment the counter. So if, for example, three definitions have the same word they count it 2, while with our method this counts 3 and it seems that our method generally overestimates. Although no evidence of the best scoring scheme can be obtained without results we think that our method may work better in chains where all definitions share a common word (and this overestimation goes higher compared to \[Guthrie et al., 1992\]) which may indicate a strong preference for that combination.</Paragraph> </Section> <Section position="3" start_page="432" end_page="433" type="sub_section"> <SectionTitle> 4.3 Step 3 </SectionTitle> <Paragraph position="0"> If a new generated combination has a higher score, it is considered as a better solution. This new (temporary maximum) score acts as a constraint (a lower minimum) to new generated combinations. At the end, the most likely sense combination is the one with the highest score. Implementation in CHIP guarantees to give one and only solution (or no solution if no overlapping combination exists). The way choices are generated is by taking at the beginning the first sense definition for each word in the sentence. This is because the most common or most typical meanings of a word are shown first in LDOCE. Following choices replace the definitions of the words one by one according to the order these words are submitted to the program. An example sentence and its output is illustrated next \[Procter et al., 1978\]: Sentence: a tight feeling in the chest.</Paragraph> <Paragraph position="1"> Total number of sense combinations: 392 Optimal solution found: tight : \[tight, have, produce, uncomfortable, feeling, closeness, part, body\] feeling = \[feeling, consciousness, something, feel, mind, body\] chest = \[chest, upper, front, part, body, enclose, heart, lung\] Its Score is: $</Paragraph> </Section> </Section> class="xml-element"></Paper>