File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/e99-1028_metho.xml
Size: 13,244 bytes
Last Modified: 2025-10-06 14:15:20
<?xml version="1.0" standalone="yes"?> <Paper uid="E99-1028"> <Title>Word Sense Disambiguation in Untagged Text based on Term Weight Learning</Title> <Section position="4" start_page="209" end_page="210" type="metho"> <SectionTitle> 3 Extraction of Collocations </SectionTitle> <Paragraph position="0"> Given a set of verbs, vl, v2,--., v,~, the algorithm produces a set of semantic clusters, which are ordered in the ascending order of their semantic deviation values* Semantic deviation is a measure of the deviation of the set in an n-dimensional Euclidean space, where n is the number of nouns which co-occur with the verbs* In our algorithm, if vi is non-polysemous, it belongs to at least one of the resultant semantic clusters. If it is polysemous, the algorithm splits it into several hypothetical verbs and each of them belongs to at least one of the clusters* Table 1 summarises the sample result from the set {close, open, end}.</Paragraph> <Paragraph position="1"> In Table 1, subsets 'open' and 'end' correspond to the distinct senses of'close'. Mu(vi,n) is the value of mutual information between a verb and a noun. If a polysemous verb is followed by a noun which belongs to a set of the nouns, the meaning of the verb within the sentence can be determined accordingly, because a set of the nouns characterises one of the possible senses of the verb.</Paragraph> <Paragraph position="2"> The basic assumption of our approach is that a polysemous verb could not be recognised correctly if collocations which represent every distinct senses of a polysemous verb were not weighted correctly. In particular, for a low Mu value, we have to distinguish between those unobserved co-occurrences that are likely to occur in a new corpus and those that are not. We extracted these collocations which represent every distinct senses of a polysemous verb using similarity-based estimation. Let (wv, nq) and (w~i , nq) be two different co-occurrence pairs. We say that wv and nq are semantically related if w~i and nq are semantically related and (wp, nq) and (w~i , nq) are semantically similar (Dagan et al., 1993). Using the estimation, collocations are extracted and term weight learning is performed. Parameters of term weighting are then estimated so as to maximise the collocations which characterise every sense and minimise the other collocations.</Paragraph> <Paragraph position="3"> Let v be two senses, wp and wl, but not be judged correctly. Let N_Setl be a set of nouns which co-occur with both v and wp, but do not co-occur with wl. Let also N.Set2 be a set of nouns which co-occur with both v and wl, but do not co-occur with wp, and N-Set3 be a set of nouns which co-occur with v, wp and wl. Extraction of collocations using similarity-based estimation is shown in Figure 2 t In Figure 2, (a-l) is the procedure to extract collocations which were not weighted correctly and (a-2) and (b) are the procedures to extract other words which were not weighted correctly.</Paragraph> <Paragraph position="4"> Sim(vi, v~) in Figure 2 is the similarity value ofvl and v~ which is measured by the inner product of their normalised vectors, and is shown in formula</Paragraph> <Paragraph position="6"> In formula (1), k is the number of nouns which co-occur with vi. vii is the Mu value between vl and nj.</Paragraph> <Paragraph position="7"> We recall that wp and nq are semantically re- null lated if w~i and nq are semantically related and (wv,n q) and (w'pi,nq) are semantically similar. (a) ' and nq are se- in Figure 2, we represent wpi mantically related when Mu(w~i,nq) >__ 3. Also, (wv,nq) and (w'pi,nq) are semantically similar if t For wt, we can replace wp with wt, nq 6 N_Sett - null N_Sets with nq E N_Set, - N.Sets, and Sim(wp, w'pl) > 0 with Sirn(wt, w'pi) > O.</Paragraph> <Paragraph position="8"> Sim(wp, w~i ) > 0. In (a)of Figure 2, for example, when (wp,nq) is judged to be a collocation which represents every distinct senses, we set Mu values of (wp,nq) and (v,nq) to a x Mu(wp,nq) and a x Mu(v,r%), 1 < a. On the other hand, when nq is judged not to be a collocation which represents every distinct senses, we set Mu values of these co-occurrence pairs to fl x Mu(wp,nq) and /3 x Mu(v,nq), 0 < j3 < 1 2 4 Clustering a Set of Verbs Given a set of verbs, VG = {vl, -- -, vm}, the algorithm produces a set of semantic clusters, which are sorted in ascending order of their semantic deviation. The deviation value of VG, Dev(VG) is shown in formula (3).</Paragraph> <Paragraph position="9"> In the experiment, we set increment value of a and decrease value of/3 to 0.001.</Paragraph> </Section> <Section position="5" start_page="210" end_page="211" type="metho"> <SectionTitle> 3 Using Wall Street Journal, we obtained 13 = 0.964 </SectionTitle> <Paragraph position="0"> gravity. In formula (3), a set with a smaller value is considered semantically less deviant.</Paragraph> <Paragraph position="1"> Figure 3 shows the flow of the clustering algorithm. As shown in '(' in Figure 3, the function Make-Inltial-Cluster-Set applies to VG and produces all possible pairs of verbs with their semantic deviation values. The result is a list of pairs called the ICS (Initial Cluster Set).</Paragraph> <Paragraph position="2"> The CCS (Created Cluster Set) shows the clusters which have been created so far. The function Make-Temporary-Cluster-Set retrieves the clusters from the CCS which contain one of the verbs of Seti. The results (Set~3) are passed to the function Reeognition-of-Polysemy, which determines whether or not a verb is polysemous.</Paragraph> <Paragraph position="3"> Let v be an element included in both Seti and Set 3. To determine whether v has two senses wp, where wp is an element of Seti, and wl, where wl is an element of Set3, we make two clusters, as shown in (4) and their merged cluster, as shown in (5).</Paragraph> <Paragraph position="4"> {vl, wp}, {v=, wl,---, (4) {v, wp,---, (5) Here, v and wp are verbs and wl, * * -, w,~ are verbs or hypothetical verbs, wl, &quot;-', wp, -.-, w,~ in (5) satisfy Dev(v, wi) < Dev(v,wj) (1 < i _< j < n).</Paragraph> <Paragraph position="5"> vl and v2 in (4) are new hypothetical verbs which correspond to two distinct senses of v.</Paragraph> <Paragraph position="6"> If v is a polysemy, but is not recognised correctly, then Extraction-of-Collocations shown in Figure 2 is applied. In Extraction-of-Collocations, for (4) and (5), a and /3 are estimated so as to satisfy (6) and (7).</Paragraph> <Paragraph position="8"> The whole process is repeated until the newly obtained cluster, Setx, contains all the verbs in the input or the ICS is exhausted.</Paragraph> </Section> <Section position="6" start_page="211" end_page="211" type="metho"> <SectionTitle> 5 Word Sense Disambiguation </SectionTitle> <Paragraph position="0"> We used the result of our clustering analysis, which consists of pairs of collocations of a distinct sense of a polysemous verb and a noun.</Paragraph> <Paragraph position="1"> Let v has senses vl, v2, &quot;--, v,~. The sense of a polysemous verb v is vi (1 < i < m) if</Paragraph> <Paragraph position="3"> nouns which co-occur with v within the five-word distance.</Paragraph> </Section> <Section position="7" start_page="211" end_page="212" type="metho"> <SectionTitle> 6 Experiment </SectionTitle> <Paragraph position="0"> This section describes an experiment conducted to evaluate the performance of our method.</Paragraph> <Section position="1" start_page="211" end_page="211" type="sub_section"> <SectionTitle> 6.1 Data </SectionTitle> <Paragraph position="0"> The data we have used is 1989 Wall Street Journal (WSJ) in ACL/DCI CD-ROM which consists of 2,878,688 occurrences of part-of-speech tagged words (Brill, 1992). The inflected forms of the same nouns and verbs are treated as single units.</Paragraph> <Paragraph position="1"> For example, 'book' and 'books' are treated as single units. We obtained 5,940,193 word pairs in a window size of 5 words, 2,743,974 different word pairs. From these, we selected collocations of a verb and a noun.</Paragraph> <Paragraph position="2"> As a test data, we used 40 sets of verbs. We selected at most four senses for each verb, the best sense, from among the set of the Collins dictionary and thesaurus (McLeod, 1987), is determined by a human judge.</Paragraph> </Section> <Section position="2" start_page="211" end_page="212" type="sub_section"> <SectionTitle> 6.2 Results </SectionTitle> <Paragraph position="0"> The results of the experiment are shown in Table 2, Table 3 and Table 4.</Paragraph> <Paragraph position="1"> In Table 2, 3 and 4, every polysemous verb has two, three and four senses, respectively. Column 1 in Table 2, 3 and 4 shows the test data. The verb v is a polysemous verb and the remains show these senses. For example, 'cause' of (1) in Table 2 has two senses, 'effect' and 'produce'. 'Sentence' shows the number of sentences of occurrences of a polysemous verb, and column 4 shows their distributions. 'v' shows the number of polysemous verbs in the data. W in Table 2 shows the number of nouns which co-occur with wp and wl. v n W shows the number of nouns which co-occur with both v and W. In a similar way, W in Table 3 and 4 shows the number of nouns which co-occur with wp ~ w2 and wp ~ w3, respectively. 'Correct' shows the performance of our method. 'Total' in the bottom of Table 4 shows the performance of 40 sets of verbs.</Paragraph> <Paragraph position="2"> Table 2 shows when polysemous verbs have two senses, the percentage attained at 80.0%. When polysemous verbs have three and four senses, the percentage was 77.7% and 76.4%, respectively.</Paragraph> <Paragraph position="3"> This shows that there is no striking difference among them. Column 8 and 9 in Table 2, 3 and 4 show the results of collocations which were extracted by our method.</Paragraph> <Paragraph position="4"> Mu < 3 shows the number of nouns which satisfy Mu(wp,n) < 3 or Mu(wt,n) <3. 'Correct' shows the total number of collocations which could be estimated correctly. Table 2 ~ 4 show that the frequency of v is proportional to that of v M W. As a result, the larger the number of v M W is, the higher the percentage of correctness of collocations is.</Paragraph> </Section> </Section> <Section position="8" start_page="212" end_page="274" type="metho"> <SectionTitle> 7 Related Work </SectionTitle> <Paragraph position="0"> Unsupervised learning approaches, i.e. to determine the class membership of each object to be classified in a sample without using sense-tagged training examples of correct classifications, is considered to have an advantage over supervised learning algorithms, as it does not require costly hand-tagged training data.</Paragraph> <Paragraph position="1"> Schiitze and Zernik's methods avoid tagging each occurrence in the training corpus. Their methods associate each sense of a polysemous word with a set of its co-occurring words (Schutze, 1992), (Zernik, 1991). Ifa word has several senses, then the word is associated with several different sets of co-occurring words, each of which corresponds to one of the senses of the word. The weakness of Schiitze and Zernik's method, however, is that it solely relies on human intuition for identifying different senses of a word, i.e. the human editor has to determine, by her/his intuition, how many senses a word has, and then identify the sets of co-occurring words that correspond to the different senses.</Paragraph> <Paragraph position="2"> &quot;-'(fall, decline, win} \] 278 &quot;-~feel, think, sense T T 280 {hit, attack, strike} I 250 {leave, remain, go} \[ 183 gcty t ~Ol accomplish, operate'}-- 216 --{occur, happen, ~ --{order, request, arrange-'~&quot;~ 240 &quot;-~ass, adopt, ~ -'~roduce, create, gro'~~&quot;--&quot;&quot;2~ --~ush, attack, pull~ -~s~ve, cedure to perform noun WSD (Yarowsky, 1995). This algorithm requires a small number of training examples to serve as a seed. The result shows that the average percentage attained was 96.1% for 12 nouns when the training data was a 460 million word corpus, although Yarowsky uses only nouns and does not discuss distinguishing more than two senses of a word.</Paragraph> <Paragraph position="3"> A more recent unsupervised approach is described in (Pedersen and Bruce, 1997). They presented three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text, i.e. McQuitty's similarity analysis, Ward's minimum-variance method and the EM algorithm. These algorithms assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. Their methods are perhaps the most similar to our present work. They reported that disambiguating nouns is more successful rather than adjectives or verbs and the best result of verbs was McQuitty's method (71.8%), although they only tested 13 ambiguous words (of these, there are only 4 verbs). Furthermore, each has at most three senses. In future, we will compare our method with their methods using the data we used in our experiment.</Paragraph> </Section> class="xml-element"></Paper>