File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/h93-1051_concl.xml

Size: 2,157 bytes

Last Modified: 2025-10-06 13:57:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1051">
  <Title>CORPUS-BASED STATISTICAL SENSE RESOLUTION</Title>
  <Section position="8" start_page="264" end_page="264" type="concl">
    <SectionTitle>
6. CONCLUSION
</SectionTitle>
    <Paragraph position="0"> The convergence of the response patterns for the three methods suggests that each of the classifiers is extracting as much data as is available in word counts from training contexts. If this is the case, any technique that uses only word counts will not be significantly more accurate than the techniques tested here.</Paragraph>
    <Paragraph position="1"> Although the degree of polysemy does affect the difficulty of the sense resolution task, a greater factor of performance is the difficulty of resolving individual senses. Using hindsight, it is obvious that the tezt sense is hard for these statistical methods to learn because one can talk or write about anything. In effect, all words between a pair of quotation marks are noise (unless line is within the quotes). In the three-sense task, the Bayesian classifier did best on the tezt sense, perhaps because it had open and closed quotes as important tokens. This advantage was lost in the six-sense task because quotation marks also appear in the contexts of the phone sense. It is not immediately obvious why the formation sense should be hard. From inspection of the contexts, it appears that the crucial information is close to the word, and context that is more than a few words away is noise.</Paragraph>
    <Paragraph position="2"> These corpus-based statistical techniques use an impoverished representation of the training contexts: simple counts of tokens appearing within two sentences. We believe significant increases in resolution accuracy will not be possible unless other information, such as word order or syntactic information, is incorporated into the techniques.</Paragraph>
    <Paragraph position="3"> ment, and Slavs Katz of IBM's T.J. Watson Research Center for generously supplying line contexts from the APHB corpus. We are indebted to George A. Miller for suggesting this line of research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML