File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1016_metho.xml
Size: 15,686 bytes
Last Modified: 2025-10-06 14:14:55
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1016"> <Title>Redundancy: helping semantic disambiguation</Title> <Section position="3" start_page="0" end_page="104" type="metho"> <SectionTitle> 2 Source of information and </SectionTitle> <Paragraph position="0"> clustering technique To acquire and learn knowledge from text for building a lexical knowledge base, we need to find a source of information that states facts, and repeats them a few times using slightly different sentence structures. A technique is needed for gathering information from that source and identify the redundant information. These two aspects are discussed hereafter: (1) the choice of a source of information and (2) the information gathering technique.</Paragraph> <Section position="1" start_page="103" end_page="103" type="sub_section"> <SectionTitle> 2.1 Choice of source of information </SectionTitle> <Paragraph position="0"> When we think of learning about words, we think of textbooks and dictionaries. Redundancy might be present but not always simplicity. Any text is written at a level which assumes some common knowledge among potential readers. In a textbook on science, the author will define the scientific terms but not the general English vocabulary. In an adult's dictionary, all words are defined, but a certain knowledge of the &quot;world&quot; (common sense, typical situations) is assumed as common adult knowledge, so the emphasis of the definitions might not be on simple cases but on more ambiguous or infrequent cases. To learn the basic vocabulary used in day to day life, a very simple children's first dictionary is a good place to start. In (Barri~re, 1997), such a dictionary is used for an application of LKB construction in which no prior semantic knowledge was assumed. In the same research the author explains how to use a multi-stage process, to transform the sentences from the dictionary into conceptual graph representations. This dictionary, the American Heritage First Dictionary 1 (AHFD), is an example of a good source of knowledge in terms of simplicity, clarity and redundancy. Some definitions introduce concepts that are mentioned again in other definitions.</Paragraph> </Section> <Section position="2" start_page="103" end_page="104" type="sub_section"> <SectionTitle> 2.2 Gathering of information </SectionTitle> <Paragraph position="0"> Barri~re and Popowich (1996) presented the idea of concept clustering for knowledge integration. First, a Lexical Knowledge Base (LKB) is built automatically and contains all the nouns and verbs of the AHFD, each word having its definition represented using the CG formalism.</Paragraph> <Paragraph position="1"> Here is a brief summary of the clustering process from there. It is not a statistical clustering but more a &quot;graph matching&quot; type of clustering. A trigger word is chosen and the CG representation of its defining sentences make up the initial CCKG (Concept clustering knowledge graph). The trigger word can be any word, 1Copyright (~)1994 by Houghton Mifflin Company.</Paragraph> <Paragraph position="2"> Reproduced by permission from THE AMERICAN HERITAGE FIRST DICTIONARY.</Paragraph> <Paragraph position="3"> but preferably it should be a semantically significant word. A word is semantically significant if it occurs less than a maximal number of times in the text, therefore excluding general words such as place, or person. The clustering is really an iterative forward and backward search within the LKB to find definitions of words that are somewhat &quot;related&quot; to the trigger word. A forward search looks at the definition of the words used in the trigger word's definition. A backward search looks at the definition of the words that use the trigger word to be defined. A word becomes part of the cluster if its CG representation shares a common sub-graph of a minimal size with the CCKG. The process is then extended to perform forward and backward searches based on the words in the cluster and not only on the trigger word.</Paragraph> <Paragraph position="4"> The cluster becomes a set of words related to the trigger word, and the CCKG presents the trigger word within a large context by showing all the links between all the words of the cluster. The CCKG is a merge of all individual CGs from the words in the cluster.</Paragraph> <Paragraph position="5"> Table 1 shows examples of clusters found by using the clustering technique on the AHFD. If a word is followed by _#, it means the sense # of that word. The CCKGs corresponding to the clusters are not illustrated as it would require much space to show all the links between all the words in the clusters? The clustering method described is based on the principle that information is acquired from a machine readable dictionary (the AHFD), and therefore each word is associated with some knowledge pertaining to it. To extend this clustering technique to a knowledge base containing non-classified pieces of information, we would need to use some indexing scheme allowing access to all the sentences containing a particular 2The reader might wonder why such word as \[rainbow\] is associated with \[needle_l\] or why \[kangaroo\] is associated with \[stomach\]. The AHFD tells the child that &quot;A rainbow looks like a ribbon of many colors across the sky.&quot; and &quot;Kangaroo mothers carry their babies in a pocket in ~ront of their stomachs.&quot; The threshold used to define the minimal size of the common subgraph necessary to include a new word in the cluster is established experimentally. Changing that threshold will change the size of the resulting cluster therefore affecting which words will be included. The clustering technique, and a derived extended clustering technique are explained in much details in (Barri~re and Fass, 1998).</Paragraph> <Paragraph position="6"> {sew, cloth, needle_i, needle_2, thread, button, patch_i, pin, pocket, wool, ribbon, rug, string, nest, prize, rainbow} kitchen, stove, refrigerator, pan} {stove; pan, kitchen, refrigerator, pot, clay} {stomach, kangaroo, pain, swallow, mouth} {airplane, wing, airport, fly_2, helicopter, jet, kit, machine, pilot, plane} {elephant, skin, trunk_l, ear, zoo, bark, leather, rhinoceros} {soap, dirt, mix, bath, bubble, suds, wash, boil, steam} {wash, soap, bath, bathroom, suds, bubble, boil, steam} word in them.</Paragraph> </Section> </Section> <Section position="4" start_page="104" end_page="107" type="metho"> <SectionTitle> 3 Semantic disambiguation </SectionTitle> <Paragraph position="0"> We propose ill this section a way to attempt at solving different types of semantic ambiguities by using the redundancy of information resulting from the clustering technique as briefly described in the previous section. Going through an example, we will look at three types of semantic ambiguity: anaphora resolution, word sense disambiguation, and relation disambiguation. null In Figure 1, Definition 3.1 shows one sentence in the definition of mail_l (taken from the AHFD, as all other definitions in Figure 1) with its corresponding CG representation. Definition 3.2 shows one sentence in the definition of stamp also with its CG representation. Using the clustering technique briefly described in the previous section, the two words are put together into a cluster triggered by the concept \[mail_l\].</Paragraph> <Paragraph position="1"> Result 3.1 shows the maximal join 3 between the two previous graphs around shared concept \[mail_l\]. Combining the information from stamp and mail_l, puts in evidence the redundant information. The reduction process for eliminating this redundancy will solve some ambiguities. This process is based on the idea of finding &quot;compatible&quot; concepts within a graph. Two concepts are compatible if their semantic distance is small. That distance is often based on aA maximal join is an operation defined within the CG formalism to gather knowledge from two graphs around a concept that they both share.</Paragraph> <Paragraph position="2"> the relative positions of concepts within the concept hierarchy (Delugach, 1993; Foo et ai., 1992; Resnik, 1995). For the present discussion we assume that two concepts are compatible if they share a semantically significant common supertype, or if one concept is a supertype of the other.</Paragraph> <Paragraph position="3"> In Result 3.1, the concept \[send\] is present twice, and also the concept \[letter\] is present in two compatible forms: \[letter\] and \[message\]. The compatibility comes from the presence in the type hierarchy 4 of one sense of \[letter\], \[letter_2\], as being a subtype of \[message\]. These compatible forms actually allow the disambiguation of concept \[letter\] into \[letter_2\]. This should update the definition of stamp shown in Definition 3.2. The other sense of \[letter\], \[letter_l\] is a subtype of \[symbol\]. The pronoun they in Result 3.1 must refer to some word, either previously mentioned in the sentence, or assumed known (as a default) in the LKB. Both (agent) relations attached to concept \[send\] lead to compatible concepts: \[they\] and \[person\]. We can therefore go back to the graph definition of \[stamp\] in which the pronoun \[they\] could have referred to the concepts \[letters\], \[packages\], \[people\] or \[stamps\], and now disambiguate it to \[people\].</Paragraph> <Paragraph position="4"> Result 3.2 shows the internal join which establishes coreference links (shown by *x, *y, *z) between compatible concepts that are in an identical relation with another concept. The reduced join, after the redundancy is eliminated, is shown in Result 3.3.</Paragraph> <Paragraph position="5"> Two types of disambiguation (anaphora resolution and word sense disambiguation) were shown up to now. The third type of disambiguation is at the level of the semantic relations. For this type of ambiguity, we must briefly introduce the idea of a relation hierarchy which is described and justified in more details in (Barri~re, 1998). A relation hierarchy, as presented in (Sowa, 1984), is simply a way to establish an order between the possible relations. The idea is to include relations that correspond to the English prepositions (it could be the prepositions of any language studied) at the top of the hierarchy, and consider them generalizations of possible deeper semantic relations. 4The type hierarchy has been built automatically from information extracted from the AHFD.</Paragraph> <Paragraph position="6"> This relation hierarchy is important for the comparison of graphs expressing similar ideas but using different sentence patterns that are reflected in the graphs by different prepositions becoming relations. Let us look in Figure 1 at Definition 3.3 which gives a sentence in the definition of \[card_2\] and Result 3.4 which gives the maximal join with graph mail_l/stamp from result 3.3 around concept \[mail_l\].</Paragraph> <Paragraph position="7"> Subgraphs \[send\]->(in)->\[mail_l\] and \[send\]-> (through)-> \[mail_l\] have compatible concepts on both sides of two different relations. These two prepositions are both supertypes of a restricted set of semantic relations. On Figure 2 which shows a small part of the relation hierarchy, we highlighted the compatibility between through and in. It shows that the two prepositions interact at manner (at location as well but more indirectly). Therefore, we can establish the similarity of those two relations via the manner relation, and the ambiguity is resolved as shown in Result 3.5.</Paragraph> <Paragraph position="8"> Note that the concept \[person\] is present many time among the different graphs in Figure 1. This gives the reader an insight into the complexity behind clustering. It all relies on compatibility of concepts and relations. Compatibility of concepts alone might be sufficient if the concepts are highly semantically significant, but for general concepts like \[person\], \[place\], \[animal\] we cannot assume so. In the graph presented in Result 3.5, there are buyers of stamps, receivers and senders of letters and they are all people, but not necessarily the same ones.</Paragraph> <Paragraph position="9"> We saw the redundancy resulting from the clustering process and how to exploit this redundancy for semantic disambiguation. We see how redundancy at the concept level without the relations can be very misleading, and the following section emphasize the importance of semantic relations.</Paragraph> </Section> <Section position="5" start_page="107" end_page="107" type="metho"> <SectionTitle> 4 The importance of semantic </SectionTitle> <Paragraph position="0"> relations Clusters are and have been used in different applications for information retrieval and word sense disambiguation. Clustering can be done statistically by analyzing text corpora (Wilks et al., 1989; Brown et al., 1992; Pereira et al., 1995) and usually results in a set of words or word senses. In this paper, we are using the clustering method used in (Barri~re and Popowich, 1996) to present our view on redundancy and disambiguation. The clustering brings together a set of words but also builds a CCKG which shows the actual links (semantic relations) between the members of the cluster.</Paragraph> <Paragraph position="1"> We suggest that those links are essential in analyzing and disambiguating texts. When links are redundant in a graph (that is we find two identical links between two compatible concepts at each end) we are able to reduce semantic ambiguity relating to anaphora and word sense.</Paragraph> <Paragraph position="2"> The counterpart to this, is that redundancy at the concept level allows us to disambiguate the semantic relations.</Paragraph> <Paragraph position="3"> To show our argument of the importance of links, we present an example. Example 4.1 shows a situation where an ambiguous word chicken (sense 1 for the animal and sense 2 for the meat) is used in a graph and needs to be disambiguated. If two graphs stored in a LKB contain the word chicken in a disambiguated form they can help solving the ambiguity. In Example 4.1, Graph 4.1 and Graph 4.2 have two isolated concepts in common: eat and chicken. Graph 4.1 and Graph 4.3 have the same two concepts in common, but the addition of a compatible relation, creating the common subgraph \[eat\]->(object)->\[chicken\], makes them more similar. The different relations between words have a large impact on the meaning of a sentence. In Graph 4.1, the word chicken can be disambiguated to chicken_2.</Paragraph> <Paragraph position="4"> Only if we look at the relations between words can we understand how different each statement is. It's all in the links... Of course those links might not be necessary at all levels of text analysis. If we try to cluster documents based on keywords, well we don't need to go to such a deep level of understanding. But when we are analyzing one text and trying to understand the meaning it conveys, we are probably within a narrow domain and the relations between words take all their importance. For example, if we are trying to disambiguate the word baseball (the sport or the ball), both senses of the words will occur in the same context, therefore using clusters of words that identify a context will not allow us to disambiguate between both senses.</Paragraph> <Paragraph position="5"> On the other hand, having a CCKG showing the relations between the baseball_l (ball), the bat, the player and the baseball_2 (sport), will express the desired information.</Paragraph> </Section> class="xml-element"></Paper>