File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/c90-2036_intro.xml
Size: 5,574 bytes
Last Modified: 2025-10-06 14:04:48
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-2036"> <Title>1- A Spelling Correction Program Based on a Noisy Channel Model</Title> <Section position="3" start_page="205" end_page="206" type="intro"> <SectionTitle> 4. Evaluation </SectionTitle> <Paragraph position="0"> Many typos such as absorbant have just one candidate correction, but others such as adusted have multiple corrections. The table below shows examples of typos with less than ten candidate corrections, the corrections ordered by likelihood.</Paragraph> <Paragraph position="1"> after fate aft ate ante daily diary dials dial dimly dilly police price voice poise pice ponce poire pilots pivots riots plots pits pots pints pious splash smash slash spasm stash swash sash pash spas Most typos have relatively few candidate corrections. The table below shows the number -3of tylx~S 5 broken out by the number of corrections in seven month-long samples of the AP newswire. In March, for example, there were '720 typos with 0 corrections, 1120 typos with 1 correction, 269 with 2 corrections, etc. The fired cohtmn shows that there is a general trend for fewer choices, though the 0-choice case is spe~zial. (The system was trained on the AP wire li'om 2/88 o 2/89; the results below were computed from AP wire during 3/89 - 9/89). We decided to look at the 2-candidate case in more detail in order to test how often the top scoring candidate agreed with a panel of three judges. The judges were given 564 triples and a few concordance lines: absurb absorb absurd financial community . *E* *S* &quot; It is absurb and probably obscene for any person so engaged to und The first word of the triple was a spell reject; the other two were the candidates (in alphabetical order). The judges were given a 5-way forced choice. They could circle any one of the three words, if they thought that was what the author had intended. Alternatively, if they thought that the author had intended something else, they could write down &quot;other&quot;. Finally, if they weren't sure, they conld write ',9,,. The distribution of responses is shown in the following table.</Paragraph> <Paragraph position="2"> The results show tlmt spell is rejecting too many words, since choice 0 (spell error) is selected about 20% of the time. In these cases, correct was given a non-problem to correct: acquirees acquirers acquires be acquirers, as they have been, than acquirees . *E* *S* If the industrials had attracted bids tit Since we were mostly concerned with evaluating the scoring function, we didn't want to be distracted with errors in spell and other problems that are beyond the scope of this paper. Therefore, we decided to consider only those cases where at least two judges circled one of the two candidates, and they agreed with each other. This left 329 triples.</Paragraph> <Paragraph position="3"> The following table shows that correct agrees with the majority of the judges in 87% of ttle 329 cases of interest. In order to help c~dibrate this result, three inferior methods ,are also evaluated. The no-prior method ignores the prior probability. The no-channel method ignolvs the channel probability. Finally, the neither method ignores both probabilities and selects the first candidate in &quot;all cases. As the following table shows, correct is significantly better than the three inferior alternatives. Both the channel and the prior probabilities provide a significant contribution, and the combination is significantly better than either in isolation. The second half of the table evaluates the judges against one another and shows that they signiticantly out-perlbrm correct, indicating that there is plenty of room for further improvement. 6 All three judges found the task more diffmult and time consuming than they had expected.</Paragraph> <Paragraph position="4"> 5. For the purposes of this experiment, a type is a lowercase word rejected by the Unix@ spell program.</Paragraph> <Paragraph position="5"> 6. Judges were only scored on triples for which they selected &quot;1&quot; or &quot;2,&quot; and for which the other two judges agreed on &quot;1&quot; or &quot;22' A triple was scored &quot;correct&quot; for one judge if that judge agreed with the other two and &quot;incorrect&quot; if that judge disagreed with the other two.</Paragraph> <Paragraph position="7"> We were also interested in testing whether the score predicted accuracy. The figure at the end of this paper shows that this is indeed so. The horizontal axis shows the score from one of the three predictors (as the lines are lableled) averaged over a group of 20 typos. The vertical axis shows the fraction of this group that were right. The diagonal line indicates perfection.</Paragraph> <Paragraph position="8"> For example, consider a group of typos whose average score was .8. Perfect accuracy would be achieved if exactly 80 percent of this group agreed with the majority opinion of the judges.</Paragraph> <Paragraph position="9"> The curved lines above and below the perfection line show one standard deviation limits for estimating probabilities from samples of 20. The observations on correct are outside of the one standard deviation limits about as much as would be called for by chance, while each of the other two methods has more points outside than would result just by chance. We conclude that the scores from correct predict accuracy fairly well; scores from the other two methods are more problematic.</Paragraph> </Section> class="xml-element"></Paper>