File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/95/w95-0104_evalu.xml
Size: 2,487 bytes
Last Modified: 2025-10-06 14:00:22
<?xml version="1.0" standalone="yes"?> <Paper uid="W95-0104"> <Title>A Bayesian hybrid method for context-sensitive spelling correction</Title> <Section position="5" start_page="51" end_page="52" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> While the previous section demonstrated that the Bayesian hybrid method does better than its components, we would still like to know how it compares with alternative methods. We looked at a method based on part-of-speech trigrams, developed and implemented by Schabes \[1995\].</Paragraph> <Paragraph position="1"> Schabes's method can be viewed as performing an abductive inference: given a sentence containing an ambiguous word, it asks which choice wi for that word would best explain the observed sequence of words in the sentence. It answers this question by substituting each wi in turn into the sentence. The wi that produces the highest-probability sentence is selected. Sentence probabilities are calculated using a part-of-speech trigram model.</Paragraph> <Paragraph position="2"> We tried Schabes's method on the usual confusion sets; the results are in the last column of Table 7. It can be seen that trigrams and the Bayesian hybrid method each have their better moments. Trigrams are at their worst when the words in the confusion set have the same part of speech. In this case, trigrams can distinguish between the words only by their prior probabilities -this follows from the way the method calculates sentence probabilities. Thus, for {between, among}, for example, where both words are prepositions, trigrams score the same as the baseline method.</Paragraph> <Paragraph position="3"> In such cases, the Bayesian hybrid method is clearly better. On the other hand, when the words in the confusion set have different parts of speech -- as in, for example, {there, their, they%e} -trigrams are often better than the Bayesian method. We believe this is because trigrams look not just at a few words on either side of the target word, but at the part-of-speech sequence of the whole sentence. This analysis indicates a complementarity between trigrams and Bayes, and suggests a combination ill which trigrams would be applied first, but if trigrams determine that the words in the confusion set have the same part of speech for the sentence at issue, then the sentence would be passed to the Bayesian method. This is a research direction we plan to pursue.</Paragraph> </Section> class="xml-element"></Paper>