File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/w95-0104_intro.xml

Size: 2,866 bytes

Last Modified: 2025-10-06 14:05:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0104">
  <Title>A Bayesian hybrid method for context-sensitive spelling correction</Title>
  <Section position="2" start_page="0" end_page="39" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Two classes of methods have been shown useful for resolving lexical ambiguity. The first tests for the presence of particular context words within a certain distance of the ambiguous target word. The second tests for collocations -- patterns of words and part-of-speech tags around the target word. The context-word and collocation methods have complementary coverage: the former captures the lexical &amp;quot;atmosphere&amp;quot; (discourse topic, tense, etc.), while the latter captures local syntax. Yarowsky \[1994\] has exploited this complementarity by combining the two methods using decision lists. The idea is to pool the evidence provided by the component methods, and to then solve a target problem by applying the single strongest piece of evidence, whatever type it happens to be. Yarowsky applied his method to the task of restoring missing accents in Spanish and French, and found that it outperformed both the method based on context words, and one based on local syntax. This paper takes Yarowsky's method as a starting point, and hypothesizes that further improvements can be obtained by taking into account not only the single strongest piece of evidence, but all the available evidence. A method is presented for doing this, based on Bayesian classifiers.</Paragraph>
    <Paragraph position="1"> The work reported here was applied not to accent restoration, but to a related lexical disambiguation task: context-sensitive spelling correction. The task is to fix spelling errors that happen to result in valid words in the lexicon; for example: I'd like the chocolate cake for ,desert.</Paragraph>
    <Paragraph position="2"> where dessert was misspelled as desert. This goes beyond the capabilities of conventional spell checkers, which can only detect errors that result in non-words.</Paragraph>
    <Paragraph position="3">  We start by applying a very simple method to the task, to serve as a baseline for comparison with the other methods. We then al)ply each of the two component methods mentioned above -context words and collocations. We try two ways of combining these components: decision lists, and Bayesian classifiers. We evaluate the above methods by comparing them with an alternative approach to spelling correction based on part-of-speech trigrams.</Paragraph>
    <Paragraph position="4"> The sections below discuss the task of context-sensitive spelling correction, the five methods we tried for the task (baseline, two component methods, and two hybrid methods), and the evaluation. The final section draws some conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML