File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/a94-1039_metho.xml
Size: 3,054 bytes
Last Modified: 2025-10-06 14:13:38
<?xml version="1.0" standalone="yes"?> <Paper uid="A94-1039"> <Title>Fukui-shi,Japan</Title> <Section position="3" start_page="198" end_page="198" type="metho"> <SectionTitle> 4 Experimental Results Using </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="198" end_page="198" type="sub_section"> <SectionTitle> Erroneous Japanese Phrase Input Through OCR 4.1 Experimental Results </SectionTitle> <Paragraph position="0"> The critical value of the 2nd-order Markov probability T was determined so as to make the value of P x R maximum for erroneou~ phrases. The experimentM results are described as follows: \[1\] Error detection and error correction of correct phrases All of correct phrase are judged to be correct.</Paragraph> <Paragraph position="1"> \[2\] The Relation of P and R for erroneous phrases The maximum values of P and R for the location of erroneous 'kanji-kana' character strings using error detection procedures and those of the errors correefed using error correction procedures, are as follows: (1) p(D) = 79.0% R (D) = 74.5% (2) p(C) = 66.2% R (c) = 84.6% p(D) The values of R(s/9) and &quot;s mean that this method can find 74.5% of the erroneous phrases Fs (substitution type), and 21.0% of the errors detected by this method are errors detected wrongly.</Paragraph> <Paragraph position="2"> From these results, it is shown that the Selective Error Correction Method using 2nd-order Markov models is useful to detect and correct erroneous characters substituted wrongly in text input through an OCR.</Paragraph> </Section> <Section position="2" start_page="198" end_page="198" type="sub_section"> <SectionTitle> 4.2 Discussion </SectionTitle> <Paragraph position="0"> \[1\] The characteristics of Erroneous Strings input through OCR.</Paragraph> <Paragraph position="1"> Compared to the errors randomly generated (Araki et at., I994), the errors caused by OCR showed high occurrence in the following four types of errors: (1) mixed type (combination of three error types ), (2) errors located at the head and at end of phrases, (3) errors that length of an erroneous string in a phrase is greater than 3, and (4) errors distributed within a phrase.</Paragraph> <Paragraph position="2"> \[2\] The comparison of the value of P and R for error detection and error correction.</Paragraph> <Paragraph position="3"> The maximum values of P and R to detect and correct errors caused by an OCR are inferior to that of errors generated randomly by 20-40%.</Paragraph> <Paragraph position="4"> The main reasons why the maximum values of P and R are reduced can mainly be explained by tile characteristics of (2) and (4) above mentioned.</Paragraph> <Paragraph position="5"> However, regarding to (1) substitution errors, (2) errors located inside phrases, (3) errors of length 1 and (4) errors connected in phrases, it is seen that the maximum values of P and R to detect and correct errors by OCR, are nearly equal to those for errors generated randomly.</Paragraph> </Section> </Section> class="xml-element"></Paper>