File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/e06-2018_relat.xml
Size: 3,465 bytes
Last Modified: 2025-10-06 14:15:51
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-2018"> <Title>Exploring the Sense Distributions of Homographs</Title> <Section position="4" start_page="156" end_page="157" type="relat"> <SectionTitle> 3 Results and discussion </SectionTitle> <Paragraph position="0"> Following the procedure described in the previous section, Table 2 gives some quantitative results. It shows the overall results for the homograph-based concordance and for the w1-based concordance for different concordance widths. In each case not only the number of cases is given where the results correspond to expectations (s1 > s2 and s3 > s4), but also the number of cases where the outcome is undecided (s1 = s2 and s3 = s4). Although this adds some redundancy, for convenience also the number of cases with an unexpected outcome is listed. All three numbers sum up to 288 which is the total number of homographs considered.</Paragraph> <Paragraph position="1"> If we look at the left half of Table 2 which shows the results for the concordances based on the homographs, we can see that the number of correct cases steadily increases with increasing width of the concordance until a width of +-300 is reached. At the same time, the number of undecided cases rapidly goes down. At a concordance width of +-300, the number of correct cases (201) outnumbers the number of incorrect cases (63) by a factor of 3.2. Note that the increase of incorrect cases is probably mostly an artefact of the sparse-data-problem as the number of undecided cases decreases faster than the number of correct cases increases.</Paragraph> <Paragraph position="2"> On the right half of Table 2 the results for the concordances based on w1 are given. Here the number of correct cases starts at a far higher level for small concordance widths, increases up to a concordance width of +-10 where it reaches its maximum, and then decreases slowly. At the concordance width of +-10 the ratio between correct and incorrect cases is 2.6.</Paragraph> <Paragraph position="3"> How can we now interpret these results? What we can say for sure when we look at the number of undecided cases is that the problem of data sparseness is much more severe if we consider the concordances of the homographs rather than the concordances of w1. This outcome can be expected as in the first case we only take a (usually small) fraction of the full corpus into account, whereas the second case is equivalent to considering the full corpus. What we can also say is that the optimal concordance width depends on data sparseness. If data is more sparse, we need a wider concordance width to obtain best results.</Paragraph> <Paragraph position="4"> concordance of homograph concordance of w1concordance In case of the full corpus the optimal width is around +-10 which is similar to average sentence length. Larger windows seem to reduce saliency and therefore affect the results adversely. In comparison, if we look at the concordances of the homographs, the negative effect on saliency with increasing concordance width seems to be more than outweighed by the decrease in sparseness, as the results at a very large width of +-300 are better than the best results for the full corpus. However, if we used a much larger corpus than the BNC, it can be expected that best results would be achieved at a smaller width, and that these are likely to be better than the ones achieved using the BNC.</Paragraph> </Section> class="xml-element"></Paper>