File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1131_evalu.xml
Size: 11,779 bytes
Last Modified: 2025-10-06 13:59:07
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1131"> <Title>Word sense disambiguation criteria: a systematic study</Title> <Section position="4" start_page="1" end_page="1" type="evalu"> <SectionTitle> 3 Results </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.1 Best criteria precision </SectionTitle> <Paragraph position="0"> Table 2 displays for each of the target words studied the optimal context size and the disambiguation precision obtained by the best unigram, bigram and trigram-based criteria.</Paragraph> <Paragraph position="1"> This table points out that best criteria take into account all words in the context. Section 3.2 will concentrate on feature reliability according to their part-of-speech. Then, section 3.2 will deal with the impact of different feature selections based on most reliable pieces of contextual evidence used for disambiguation with DL classifier.</Paragraph> <Paragraph position="2"> According to the Table 2, the optimal context size comprises +-1 to +-5 words. Further developments of the context optimality will be made in section 3.4.</Paragraph> <Paragraph position="3"> Surprisingly, Table 2 outlines the fact that the criterion that obtains the best precision is based on bigrams and not on unigrams. This subject is dealt with in section 3.5.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.2 The most reliable parts-of-speech </SectionTitle> <Paragraph position="0"> In this section, we aim at learning the part-of-speech and the space distribution of all pieces of contextual evidence used for disambiguation.</Paragraph> <Paragraph position="1"> To this end, we use the DL classifier because it bases its classifications solely on the most reliable piece of evidence identified by the criteria. Thus, DL classifier enables us to learn which is the part-of-speech and the space distribution of this indicator by using the criterion [1gr|mform|ordered|all]. Table 3 and graphics presented in Figure 1 synthesize this study results.</Paragraph> <Paragraph position="2"> Table 3 enables to bring out the following results (we quote between brackets and in order the main results obtained for the nouns, the adjectives and the verbs): * common nouns (NCOM) obtain one of the best precisions (93.0%; 93.7%; 87.8%) and represent one of the most used indicators (12.7%; 25.3%; 26%) for the three term categories; * adjectives (ADJ) represent good indicators for nouns (p=94.9%) and adjectives (p=80.3%), but they are especially useful for nouns since they are used in 13.7% of the cases against 2.2% only for the adjectives; * adverbs (ADV) are mainly useful for adjectives; their precision reaches 79.2% and they are used in 9.6% of the cases; * verbs in the infinitive (VINF) are very reliable indicators for the three parts-of-speech (90.2%; 80.6%; 87.2%), but they are rarely used as they are not very often encountered in the context (0.9%; 0.7%; 2.9%); * conjugated verbs (VCON) obtain poor precision (67.9%; 53.9%; 54.4%).</Paragraph> <Paragraph position="3"> Figure 1 graphics show the space distribution of the main parts-of-speech of the indicators used to disambiguate each one of the three term categories. The dissymmetric shape of verbs, and more precisely, the strong prevalence of indicators located in position +1, +2, +3, makes us believe that disambiguating verbs is done more accordingly to their object than to their subject since as a rule the main form encountered is subject-verb-object.</Paragraph> <Paragraph position="4"> Table 4 summarizes these graphs. Our results and those of (Yarowsky, 1993) agree in many respects, although his study applies only to pseudo-words having only two &quot;senses&quot;: * &quot;Adjectives derive almost all of their disambiguating information from the nouns they modify&quot;; * &quot;Nouns are best disambiguated by directly adjacent adjectives or nouns&quot;; * &quot;Verbs derive more disambiguating information from their objects than from their subjects&quot;.</Paragraph> </Section> <Section position="3" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.3 The importance of stop-words </SectionTitle> <Paragraph position="0"> content words using DL classifier.</Paragraph> <Paragraph position="1"> Many studies do not consider all the words of the context (El-Beze, Loupy, Marteau, 1998; Mooney, 1996; Ng, Lee, 2002). The assumption according to which content words represent the most reliable indicators underlies the choice to use only content words based criteria. This seems to be obvious, but it is not confirmed in Table 2. Table 5 shows the average decrease of the precision of the content words based criteria ([par1|par2|par3|content]) compared to the same criteria based on all words ([par1|par2|par3|all]). This table shows that the decrease is low when the criteria are based on unigrams and are used to disambiguate nouns, but it can become very high in the other cases, and in particular for verbs disambiguation.</Paragraph> <Paragraph position="2"> Table 3 informs us about the disambiguation precision according to the coarse-grained part-of-speech tag. This table shows that using content words only is probably not the most appropriate feature selection (for example inflected verbs are not relevant). We have therefore chosen to try out a more subtle selection (we refer to it by par4=selected) by restraining to the most reliable parts-of-speech according to Table 3: * For nouns, we use indicators having the following coarse-grained part-of-speech tagging: * For adjectives, we use indicators having the following coarse-grained part-of-speech tagging: NCOM, DET, ADJ, ADV, VINF or NPRO; * For verbs, we use indicators having the following coarse-grained part-of-speech tagging: NCOM, ADJ, PROPE, PCTFORTE, SUB, VINF, NPRO, VPAR or PRODE.</Paragraph> <Paragraph position="3"> Table 6 gives a comparison of the precision reached by the following 3 criteria:</Paragraph> <Paragraph position="5"> We observe that this subtler selection lowers the disambiguation precision too. We assume then that all words, whatever their part-of-speech, contribute to the disambiguation.</Paragraph> </Section> <Section position="4" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.4 Optimal context </SectionTitle> <Paragraph position="0"> We tested up to +-10 000 word contexts.</Paragraph> <Paragraph position="1"> However, the best precision is always obtained for short contexts ranging from +-1 to +-5 words. These results are similar to those obtained by many other researches (El-Beze, et al., 1998; Yarowsky, 1993; 2000).</Paragraph> <Paragraph position="2"> Optimal context size is criteria, target part-of-speech and n-gram size dependent. In particular, it increases with the n-gram size.</Paragraph> <Paragraph position="3"> Table 7 shows, for all the criteria we examined, the average size of the optimal context by the n-gram size and by the part-of-speech.</Paragraph> <Paragraph position="4"> The main indicators used to disambiguate nouns and adjectives surround roughly symmetrically the word we want to disambiguate. On the contrary, indicators for verbs tend to be mainly situated after the verb. Therefore, a non-symmetrical context shifted forward by a word proves to be more appropriate. Our experiments show that the use of this shifted context improves the precision of the verbs disambiguation by 0.75% in average.</Paragraph> <Paragraph position="5"> word? The lemma being unique for a given word, if only lemmas are considered, an n-gram which is adjacent to the target word contains precisely the same information as the same n-gram to which the target word is added in order to compose a (n+1)gram. The n-gram that is situated next to the word to disambiguate can thus be assimilated to the (n+1)-gram which contains it. Consequently, the question becomes: do n-grams have to contain the word to disambiguate or at least to be adjacent to it? Several studies set themselves this constraint probably because n-grams are used to capture fixed constructions containing the word to disambiguate. Table 2 shows that the optimal context size for best bigram or trigram-based criteria does not fit this constraint. The relevant n-grams do not necessarily contain the target word and are not necessarily adjacent to it. For example, for nouns and verbs, the +-4 words context is the optimal context size of the bigram-based criteria which obtains the best disambiguation precision. This criterion produces some bigrams separated from the target word by one or two words. However, this single observation cannot enable us to abandon the constraint in terms of containing or being adjacent to the target word. Indeed, the bigram increasing distance may help locating an information which could be captured by the joint use of one or several larger n-grams. We have thus evaluated a combination of criteria in which all n-grams contain the target word in a context up to +-4 words:</Paragraph> <Paragraph position="7"> words.</Paragraph> <Paragraph position="8"> This combination leads to a disambiguating precision of 74.3%, which is lower than the one obtained using the criteria [2gr|lemma|leftright|all] alone with a +-4 words context. This experiment confirms that constraining the context to contain the word to disambiguate, or at least to be adjacent to it, decreases disambiguation accuracy. Consequently, nothing justifies this constraint on criteria.</Paragraph> </Section> <Section position="5" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 3.5 Surprising bigrams accuracy </SectionTitle> <Paragraph position="0"> Contrary to all expectations, Table 2 shows that the best unigram-based criterion ([1gr|lemma|ordered|all]) is definitely less accurate than the best bigram-based criterion ([2gr|lemma|leftright|all]). However, in practice, bigrams and trigrams are seldom used alone. When used, they are taken in conjunction with unigrams which are supposed to convey the most reliable piece of information.</Paragraph> <Paragraph position="1"> Why does the criterion [2gr|lemma|leftright|all] work so well? First, since this criterion is a juxtaposition of lemmas, among the features generated by this criterion, the left and the right unigrams are to be found, even if these unigrams are disguised as bigrams (cf. section 3.4.2). As these pieces of contextual evidence are certainly the most important ones (cf. section 3.4), it makes sense that this bigram-based criterion obtains good results.</Paragraph> <Paragraph position="2"> Second, in a context of a higher size, the juxtaposition of two words seems more relevant than one isolated word. For example, to disambiguate the word utile, the bigram pour_le is relevant, whereas the single unigrams pour and le are not of much help.</Paragraph> <Paragraph position="3"> Lastly, sometimes, the presence of a unigram noncontiguous to the target word can be sufficient to solve the ambiguity. But using bigram-based criteria does not necessarily lose the piece of information conveyed by unigram-based criteria.</Paragraph> <Paragraph position="4"> For example, a determiner, a preposition or an apostrophe often precedes a common noun. The lemmatisation variability of this determiner, this preposition or this apostrophe is low for a given common noun located at a given distance from a given target word. Therefore, the piece of information brought out by the juxtaposition of the noun and the preceding word is often very similar to the piece of information provided by the noun alone.</Paragraph> </Section> </Section> class="xml-element"></Paper>