File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/h93-1051_evalu.xml
Size: 5,145 bytes
Last Modified: 2025-10-06 14:00:08
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1051"> <Title>CORPUS-BASED STATISTICAL SENSE RESOLUTION</Title> <Section position="7" start_page="262" end_page="264" type="evalu"> <SectionTitle> 5. RESULTS AND DISCUSSION </SectionTitle> <Paragraph position="0"> All of the classifiers performed best with the largest number (200) of training contexts. The percent correct results reported below are averaged over the three trials with 200 training contexts. The Bayesian classifier averaged 71~ correct answers, the content vector classifier averaged 72%, and the neural network classifier averaged 76%. None of these differences are statlstlcally significant due to the limited sample size of three trials.</Paragraph> <Paragraph position="1"> The results reported below are taken from trial A with 200 training contexts. Confusion matrices of this trial are given in Tables 2 - 4. 4 The diagonals show the number of correct classifications for each sense, and the off-diagonal elements show classification errors. For example, the entry containing 5 in the bottom row of Table 2 means that 5 contexts whose correct sense is the product sense were classified as the phone sense.</Paragraph> <Paragraph position="2"> Ten heavily weighted tokens for each sense for each classifter appear in Table 1. The words on the list seem, for the most part, indicative of the target sense. However, there are some consistent differences among the methods. For example, whereas the Bayesian method is sensitive to proper nouns, the neural network appears to have no such preference.</Paragraph> <Paragraph position="3"> To test the hypothesis that the methods have different response patterns, we performed the X 2 test for correlated proportions. This test measures how consistently the methods treat individual test contexts by determining whether the classifiers are making the same classification errors in each of the senses. For each sense, the test compares the off-diagonal elements of a matrix whose columns contain the responses of one classifier and the rows show a second classifier's responses in the same test set. This process constructs a square matrix whose diagonal elements contain the number of test contexts on which the two methods agree.</Paragraph> <Paragraph position="4"> The results of the X ~ test for a three-sense resolution task (product,/orraation and tezt), s indicate that the response pattern of the content vector classifier is very significantly different from the patterns of both the Bayesian and neural network classifiers, but the Bayesian response pattern is significantly different from the neural network pattern for only the product sense. In the six-sense disambiguation task, the X 2 results indicate that the Bayesian and neural network classifiers' response patterns are not significantly different for any sense. The neural network and Bayesian classifiers' response patterns are significantly different from the content vector classifier only in the formation and tezt senses. Therefore, with the addition of three senses, the classifiers' response patterns appear to be converging.</Paragraph> <Paragraph position="5"> The pilot two-sense distinction task (between product and formation) yielded over 90% correct answers. In the three-sense distinction task, the three classifiers had a mean of 76% correct, 6 yielding a sharp degradation with the addition of a third sense. Therefore, we hypothesized degree of polysemy to be a major factor for performance.</Paragraph> <Paragraph position="6"> We were surprised to find that in the six-sense task, all three classifiers degraded only slightly from the three-sense task, with a mean of 73% correct. Although the addition of three new senses to the task caused consistent degradation, the degradation is relatively slight. Hence, we conclude that some senses are harder to resolve than others, and it appears that overall accuracy is a function of the difficulty of the sense rather than being strictly a function of the number of senses. The hardest sense to learn, for all three classifiers, was tezt, followed by formation~ To test the validity of this conclusion, further tests need to be run.</Paragraph> <Paragraph position="7"> SThe Bayesian classifier averaged 76~ correct answers, the content vector classifier averaged 73%, and the neural networks 79%. If statistical classifiers are to be part of higher-level NLP tasks, characteristics other than overall accuracy are important. Collecting training contexts is by far the most time-consuming part of the entire process. Until training-context acquisition is fully automated, classifiers requiring smaller training sets are preferred. Figure 1 shows that the content vector classifier has a flatter learning curve between 50 and 200 training contexts than the neural network and Bayesian classifiers, suggesting that the latter two require more (or larger) training contexts. Ease and efficiency of use is also a factor. The three classifiers are roughly comparable in this regard, although the neural network classifier is the most expensive to train.</Paragraph> </Section> class="xml-element"></Paper>