File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/h94-1046_concl.xml

Size: 2,789 bytes

Last Modified: 2025-10-06 13:57:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1046">
  <Title>USING A SEMANTIC CONCORDANCE FOR SENSE IDENTIFICATION</Title>
  <Section position="7" start_page="242" end_page="242" type="concl">
    <SectionTitle>
5. SUMMARY AND CONCLUSIONS
</SectionTitle>
    <Paragraph position="0"> The considerable improvement that results from having knowledge of sense frequencies is apparent from the results summarized in  The similarity of the results obtained with the most-frequent and the co-occurrence heuristics is attributable to the fact that when co-occurrence data were indeterminate or lacking, the most-frequent heuristic was the default. With a large semantic concordance, we would expect the co-occurrence heuristic to do better--it should be able to capture the topical context which, in other work \[8\], we have found to give scores as high as 70-75% for polysemous words.</Paragraph>
    <Paragraph position="1"> How representative are the percentages in Table 2? Obviously, they are specific to the Brown Corpus; in a restricted domain of discourse, polysemous words would not be used in such a wide variety of ways and a most-frequent heuristic would be correct far more frequently. The percentages in Table 2 are &amp;quot;broadly representative of current edited American English&amp;quot; \[3\]. They are also, of course, specific to WordNet. If WordNet did not draw so many sense distinctions, all of these statistical heuristics would be correct more often. But WordNet does not draw impossibly fine sense distinctions. Dictionaries differ widely in the number of sense distinctions they draw; pocket dictionaries offer few and unabridged dictionaries offer many alternative senses. WordNet is somewhere in the middle; it provides about the same semantic granularity as a good desk dictionary. Anything coarser could not have been used to tag passages from the Brown Corpus.</Paragraph>
    <Paragraph position="2"> Finally, can these heuristics provide anything more than benchmarks? Can they play a role in a system that does an acceptable job of sense identification? It should be noted that none of these heuristics takes into account the local context. Even the co-occurrence heuristic is indifferent to word order; imposing word-order constraints would have made sparse data sparser still. Local context--say, +- 2 or 3 words--should contain sufficient information to identify the intended sense of most polysemous words. Given a system capable of exploiting local context, statistical heuristics might still provide a default, as Yarowsky \[9\] suggests; something to fall back on when local identification fails. Under those conditions, these statistical heuristics could indeed provide a floor on which more intelligent systems could build.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML