File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1046_intro.xml

Size: 2,616 bytes

Last Modified: 2025-10-06 14:05:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1046">
  <Title>USING A SEMANTIC CONCORDANCE FOR SENSE IDENTIFICATION</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> It is generally recognized that systems for automatic seine identification should be evaluated against a null hypothesis. Gale, Church, and Yarowsky \[1\] suggest that the appropriate basis for comparison would be a system that assumes that each word is being used in its most frequently occurring sere. They review the literature on how well word-disambiguation programs perform; as a lower bound, they estimate that the most frequent sense of polysemous words would be correct 75% of the time, and they propose that any sense-identification system that does not give the correct sense of polysemous words more than 75% of the time would not be worth serious consideration.</Paragraph>
    <Paragraph position="1"> The value of setting such a lower bound is obvious. However, Gale&amp;quot; Church, and Yarowsky \[I\] do not make clear how they determined what the most frequently occurring senses are. In the absence of such information, a case can be made that the lower bound should be given by the proportion of monosemous words in the textual corpus.</Paragraph>
    <Paragraph position="2"> Although most words in a dictionary have only a single sense&amp;quot; it is the polysemons words that occur most frequently in speech and writing. This is true even when we ignore the small set of highly pelysemous closed-class words (pronouns, prepositions, auxiliary verbs, etc.) that play such an important structural role. For exampie, 82.3% of the opon-class words in WordNet \[2\] are monosemous, but only 27.2% of the open-class words in a sample of 103 passages from the Brown Corpus \[3\] were monosemous.</Paragraph>
    <Paragraph position="3"> * Hunter College and Graduate School of the City Univendty of New Ytz~k That is to say, 27% of the time no decision would be needed, but for the remaining 73% of the open-class words, the response would have to be &amp;quot;don't know.&amp;quot; This is probably the lowest lower bound anyone would propose, although if the highly pelysemous, very frequently used closed-class words were included, it would be even lower.</Paragraph>
    <Paragraph position="4"> A better performance figure would result, of course, if, instead of responding &amp;quot;don't know,&amp;quot; the system were to guess. What is the percentage correct that you could expect to obtain by guessing7</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML