File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1162_metho.xml
Size: 17,341 bytes
Last Modified: 2025-10-06 14:08:47
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1162"> <Title>PageRank on Semantic Networks, with Application to Word Sense Disambiguation</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 PageRank on Semantic Networks </SectionTitle> <Paragraph position="0"> In this section, we briefly describe PageRank (Brin and Page, 1998), and describe the view of WordNet as a graph, which facilitates the application of the graph-based ranking algorithm on this semantic network. null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 The PageRank Algorithm </SectionTitle> <Paragraph position="0"> Iterative graph-based ranking algorithms are essentially a way of deciding the importance of a vertex within a graph; in the context of search engines, it is a way of deciding how important a page is on the Web. In this model, when one vertex links to another one, it is casting a vote for that other vertex. The higher the number of votes that are cast for a vertex, the higher the importance of the vertex. Moreover, the importance of the vertex casting the vote determines how important the vote itself is, and this information is also taken into account by the ranking model. Hence, the score associated with a vertex is determined based on the votes that are cast for it, and the score of the vertices casting these votes.</Paragraph> <Paragraph position="1"> Let G = (V;E) be a directed graph with the set of vertices V and set of edges E, where E is a subset of V V . For a given vertex Vi, let In(Vi) be the set of vertices that point to it, and let Out(Vi) be the set of edges going out of vertex Vi. The PageRank score of vertex Vi is defined as follows:</Paragraph> <Paragraph position="3"> where d is a damping factor that can be set between 0 and 1 2.</Paragraph> <Paragraph position="4"> Starting from arbitrary values assigned to each node in the graph, the PageRank computation iterates until convergence below a given threshold is achieved. After running the algorithm, a fast in-place sorting algorithm is applied to the ranked graph vertices to sort them in decreasing order.</Paragraph> <Paragraph position="5"> PageRank can be also applied on undirected graphs, in which case the out-degree of a vertex is equal to the in-degree of the vertex, and convergence is usually achieved after a fewer number of iterations. null</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 WordNet as a Graph </SectionTitle> <Paragraph position="0"> WordNet is a lexical knowledge base for English that defines words, meanings, and relations between them. The basic unit in WordNet is a synset, which is a set of synonym words or word phrases, and represents a concept. WordNet defines several semantic relations between synsets, including ISA relations (hypernym/hyponym), PART-OF relations (meronym/holonym), entailment, and others.</Paragraph> <Paragraph position="1"> To represent WordNet as a graph, we use an instance-centric data representation, which defines 2The role of the damping factor d is to incorporate into the PageRank model the probability of jumping from a given vertex to another random vertex in the graph. In the context of Web surfing, PageRank implements the &quot;random surfer model&quot;, where a user clicks on links at random with a probability d, and jumps to a completely new page with probability 1 d. The factor d is usually set at 0.85 (Brin and Page, 1998), and this is the value we are also using in our implementation.</Paragraph> <Paragraph position="2"> synsets as vertices, and relations or sets of relations as edges. The graph can be constructed as an undirected graph, with no orientation defined for edges, or as a directed graph, in which case a direction is arbitrarily established for each relation (e.g. hyponym ! hypernym).</Paragraph> <Paragraph position="3"> Given a subset of the WordNet synsets, as identified in a given text or by other selectional criteria, and given a semantic relation, a graph is constructed by identifying all the synsets (vertices) in the given subset that can be linked by the given relation (edges). Relations can be also combined, for instance a graph can be constructed so that it accounts for both the ISA and the PART-OF relations between the vertices in the graph.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 PageRank-based Word Sense </SectionTitle> <Paragraph position="0"> Disambiguation In this section, we describe a new unsupervised open-text word sense disambiguation algorithm that relies on PageRank-style algorithms applied on semantic networks.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Building the Text Synset Graph </SectionTitle> <Paragraph position="0"> To enable the application of PageRank-style algorithms to the disambiguation of all words in open text, we have to build a graph that represents the text and interconnects the words with meaningful relations. null Since no a-priori semantic information is available for the words in the text, we start with the assumption that every possible sense of a word is a potentially correct sense, and therefore all senses for all words are to be included in the initial search set. The synsets pertaining to all word senses form therefore the vertices of the graph. The edges between the nodes are drawn using synset relations available in WordNet, either explicitly encoded in the network, or derived by various means (see Sections 4.2, 4.3).</Paragraph> <Paragraph position="1"> Note that not all WordNet arcs are suitable for combination with PageRank, as they sometimes identify competing word senses which tend to share targets of incoming or outgoing links. As our objective is to differentiate between senses, we want to focus on specific rather than shared links. We call two synsets colexical if they represent two senses of the same word - that is, if they share one identical lexical unit. For a given word or word phrase, colexical synsets will be listed as competing senses, from which a given disambiguation algorithm should select one.</Paragraph> <Paragraph position="2"> To ensure that colexical synsets do not &quot;contaminate&quot; each other's PageRank values, we have to make sure that they are not linked together, and hence they compete through disjoint sets of links.</Paragraph> <Paragraph position="3"> This means that relations between synsets pertaining to various senses of the same word or word phrase are not added to the graph. Consider for instance the verb travel: it has six senses defined in Word-Net, with senses 2 and 3 linked by an ISA relation (travel#2 ISA travel#3). Since the synsets pertaining to these two senses are colexical (they share the lexical unit travel), this ISA link is not added to the text graph.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Basic Semantic Relations </SectionTitle> <Paragraph position="0"> WordNet explicitly encodes a set of basic semantic relations, including hypernymy, hyponymy, meronymy, holonymy, entailment, causality, attribute, pertainimy. WordNet 2.0 has also introduced nominalizations - which link verbs and nouns pertaining to the same semantic class, and domain links - a first step toward the classification of synsets, based on the &quot;ontology&quot; in which a given synset is relevant to. While the domain relations usually add a small number of links, their use tends to help focusing on a dominant field which was observed to help the disambiguation process.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Derived Semantic Relations </SectionTitle> <Paragraph position="0"> Two or more basic WordNet relations can be combined together to form a new relation. For instance, we can combine hypernymy and hyponymy to obtain the coordinate relation - which identifies synsets that share the same hypernym. For example, dog#1 and wolf#1 are coordinates, since they share the same hypernym canine#1.</Paragraph> <Paragraph position="1"> It is worth mentioning the composite relation xlink, which is a new global relation that we define, which integrates all the basic relations (nominalizations and domain links included) and the coordinate relation. Shortly, two synsets are connected by an xlink relation if any WordNet-defined relation or a coordinate relation can be identified between them.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 The PageRank Disambiguation Algorithm </SectionTitle> <Paragraph position="0"> The input to the disambiguation algorithm consists of raw text. The output is a text with word meaning annotations for all open-class words. Given a semantic relation SR, which can be a basic or composite relation, the algorithm consists of the following main steps: During preprocessing, the text is tokenized and annotated with parts of speech. Collocations are identified using a sliding window approach, where a collocation is considered to be a sequence of words that forms a compound concept defined in WordNet.</Paragraph> <Paragraph position="1"> Named entities are also identified at this stage.</Paragraph> <Paragraph position="2"> Step 2: Graph construction.</Paragraph> <Paragraph position="3"> Build the text synset graph: for all open class words in the text, identify all synsets defined in Word-Net, and add them as vertices in the graph. Words previously assigned with a named entity tag, and modal/auxiliary verbs are not considered. For the given semantic relation SR, add an edge between all vertices in the graph that can be linked by the relation SR.</Paragraph> <Paragraph position="4"> Step 3: PageRank.</Paragraph> <Paragraph position="5"> Assign an initial small value to each vertex in the graph. Iterate the PageRank computation until it converges - usually for 25-30 iterations. In our implementation, vertices are initially assigned with a value of 1. Notice that the final values obtained after PageRank runs to completion are not affected by the choice of the initial value, only the number of iterations to convergence may be different.</Paragraph> <Paragraph position="6"> Step 4: Assign word meanings.</Paragraph> <Paragraph position="7"> For each ambiguous word in the text, find the synset that has the highest PageRank score, which is uniquely identifying the sense of the word. If none of the synsets corresponding to the meanings of a word could be connected with other synsets in the graph using the given relation SR, the word is assigned with a random sense (when the WordNet sense order is not considered), or with the first sense in WordNet (when a sense order is available).</Paragraph> <Paragraph position="8"> The algorithm can be run on the entire text at once, in which case the resulting graph is fairly large - usually more than two thousands vertices - and has high connectivity. Alternatively, it can be run on smaller sections of the text, and in this case the graphs have lower number of vertices and lower connectivity. In the experiments reported in this paper, we are using the first option, since it results in richer synset graphs and ensures that most of the words are assigned a meaning using the PageRank sense disambiguation algorithm.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Related Algorithms </SectionTitle> <Paragraph position="0"> We overview in this section two other word sense disambiguation algorithms that address all words in open text: Lesk algorithm, and the most frequent sense algorithm3. We also propose two new hybrid algorithms that combine the PageRank word sense disambiguation method with the Lesk algorithm and the most frequent sense algorithm.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 The Lesk algorithm </SectionTitle> <Paragraph position="0"> The Lesk algorithm (Lesk, 1986) is one of the first algorithms used for the semantic disambiguation of all words in open text. The only resource required by the algorithm is a set of dictionary entries, one for each possible word sense, and knowledge about the immediate context where the sense disambiguation is performed.</Paragraph> <Paragraph position="1"> 3The reason for choosing these algorithms over the other methods mentioned in section 2 is the fact that they address all open class words in a text.</Paragraph> <Paragraph position="2"> The main idea behind the original definition of the algorithm is to disambiguate words by finding the overlap among their sense definitions. Namely, given two words, W1 and W2, each with NW1 and NW2 senses defined in a dictionary, for each possible sense pair W i1 and W j2 , i=1..NW1, j=1..NW2, first determine their definitions overlap, by counting the number of words they have in common. Next, the sense pair with the highest overlap is selected, and consequently a sense is assigned to each of the two words involved in the initial pair.</Paragraph> <Paragraph position="3"> When applied to open text, the original definition of the algorithm faces an explosion of word sense combinations4, and alternative solutions are required. One solution is to use simulated annealing, as proposed in (Cowie et al., 1992). Another solution - which we adopt in our experiments - is to use a variation of the Lesk algorithm (Kilgarriff and Rosenzweig, 2000), where meanings of words in the text are determined individually, by finding the highest overlap between the sense definitions of each word and the current context. Rather than seeking to simultaneously determine the meanings of all words in a given text, this approach determines word senses individually, and therefore it avoids the combinatorial explosion of senses.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Most Frequent Sense </SectionTitle> <Paragraph position="0"> WordNet keeps track of the frequency of each word meaning within a sense-annotated corpus. This introduces an additional knowledge-element that can significantly improve the disambiguation performance. null A very simple algorithm that relies on this information consists of picking the most frequent sense for any given word as the correct one. Given that sense frequency distributions tend to decrease exponentially for less frequent senses, this guess usually outperforms methods that use exclusively the content of the document and associated dictionary information. null</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.3 Combining PageRank and Lesk </SectionTitle> <Paragraph position="0"> When combining two different algorithms, we have to ensure that their effects accumulate without disturbing each algorithms internal workings.</Paragraph> <Paragraph position="1"> The PageRank+Lesk algorithm consists in providing a default ordering by Lesk (possibly after shuffling WordNet senses to remove the sense frequency bias), and then applying PageRank, which 4Consider for instance the text &quot;I saw a man who is 108 years old and can still walk and tell jokes&quot;, with nine open class words, each with several possible senses : see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3). Given the total of 43,929,600 possible sense combinations, finding the optimal combination using definition overlaps is not a tractable approach.</Paragraph> <Paragraph position="3"> will eventually reorder the senses. With this approach, senses that have similar PageRank values will keep their Lesk ordering. As PageRank over-rides Lesk one can notice that in this case we prioritize PageRank, which tends to outperform Lesk.</Paragraph> <Paragraph position="4"> The resulting algorithm provides a combination which improves over both algorithms individually, as shown in Section 6.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.4 Combining PageRank with the Sense Frequency </SectionTitle> <Paragraph position="0"> The combination of PageRank with the WordNet sense frequency information is done in two steps: introduce the WordNet frequency ordering by removing the random permutation of senses use a formula which combines PageRank and actual WordNet sense frequency information While a simple product of the two ranks already provides an improvement over both algorithms the following formula which prioritizes the first sense provides the best results:</Paragraph> <Paragraph position="2"/> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 FR PR if N = 1 FR PR if N > 1 </SectionTitle> <Paragraph position="0"> where FR represents the WordNet sense frequency, PR represents the rank computed by PageRank, N is the position in the frequency ordered synset list, and Rank represents the combined rank.</Paragraph> </Section> class="xml-element"></Paper>