File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1030_metho.xml

Size: 24,391 bytes

Last Modified: 2025-10-06 14:13:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1030">
  <Title>AN EVALUATION TO DETECT AND CORRECT ERRONEOUS CHARACTERS WRONGLY SUBSTITUTED, DELETED AND INSERTED IN JAPANESE AND ENGLISH SEN~IENCES USING MARKOV MODELS</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> In optical character recognition and coni.inuous speech recognition of a natural language, it has been diflicult to detect error characters which are wrongly deleted and inserted. \]n &lt;&gt;rder to judge three types of the errors, which are characters wrongly substituted, deleted or inserted in a Japanese &amp;quot;bunsetsu&amp;quot; and an l';nglish word, and to correct these errors, this paper proposes new methods using rn-th order Markov chain model for Japanese &amp;quot;l~anjikana&amp;quot; characters and Fmglish alphabets, assuming that Markov l)robability of a correct chain of syllables or &amp;quot;kanji-kana&amp;quot; characters is greater than that of erroneous chains.</Paragraph>
    <Paragraph position="1"> From the results of the experiments, it is concluded that the methods is usefld for detecting as well as correcting these errors in Japanese &amp;quot;bunsetsu&amp;quot; and English words.</Paragraph>
    <Paragraph position="2"> Key words: Markov model, error detection, error correction, bunsetsu, substitution, deletion, insertion</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="187" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In order to improve the man-machine interface with computers, the &lt;tevelopment of input devices such as optical cha.racter tea&lt;lets (OCR) or speech recognition devices are expected, llowew;r, it is not easy to input Japanese sentences J)y these devices, because.</Paragraph>
    <Paragraph position="1"> they are written by many kinds of characters, especially thousands of &amp;quot;kanji&amp;quot; characters. The sentences input through an OCR.</Paragraph>
    <Paragraph position="2"> or a speech recognition device usuMly contain erroneous character strings.</Paragraph>
    <Paragraph position="3"> The techniques of natural language processing are expected to find and correct these errors. tIowever, since current technologies of natural language analysis have been developed for correct sentences, they cannot directly be applied to these problems. Up to now, statistical approaches have been made to this problem. null Markov mo&lt;lels are considered to be one of&amp;quot; machine learning models, sinfilar to neural networks a.nd fuzzy models. They have been applied to character chains of natural lang,,a~ges (e.g.,l);nglish)\[l\],\[2\], a.nd to phoneme reco~gnition 3 . \[41 cha.ins in continuous speech. . \[ 1~1. C/1' 2nd-orde.r Markov model nt bunsets',l is known to be useful to correct errors in &amp;quot;kanjikana.&amp;quot; &amp;quot;/m nsetsu&amp;quot; \[(;\],to choose a correct syllable chain from Japa.nese syllable &amp;quot;bunsetsu&amp;quot; candidates \[7\], and to re(!nce the ambiguities in translation processing of non-segmented &amp;quot;kana.&amp;quot; sentences into &amp;quot;kanji-kana&amp;quot; sentences \[8\].</Paragraph>
    <Paragraph position="4"> The erroneous characters can be classilied Ul,O three types, lhe hrst is w~ongly recognized chal;aclers instead of correct (haracters. The second and the third are wrongly inserted and deleted (skipped) characters respectively.</Paragraph>
    <Paragraph position="5"> Markov chain mode.Is above mentioned were restricted to tind and correct the first type of errors\[5\],\[6\]. No method has been proposed for correcting errors of the second and the. third types. 'Phe. rea.son might be considered to be I.he di\[ticulties of finding the error location and distinguishing between deletion and insertion er I'ors.</Paragraph>
    <Paragraph position="6"> On the other hand, contextual algorithm utilizing ,,-g,'atn letl.er statistics (e.g.\[.()\]) a,,d a dictionary look-ul) algorithm\[10\] have been discussed to detect a.nd correct erroneous characters in English sentences, which is segmented into words.</Paragraph>
    <Paragraph position="7"> This paper proposes new methods, which are able to be applied to a nor&gt;segmented chains or&amp;quot; characters, to judge three types of the errors, which are characters wrongly subst.ituted, deleted a.nd inserted in a Japanese &amp;quot;bunsetsu&amp;quot;, and to correct these errors in Japanese &amp;quot;kanji-l&lt;ana&amp;quot; chains using m-th of der Markov chain model. The methods are based on the idea about the relation between the types of errors and the length of a chain in which the wdnes of Markov joint probability remain small, l,'urthermore, this method is ap- null plied to detect and correct errors in segmented English words* Experiments were conducted for the case of 2nd-order and 3rd-order Markov model, and they were applied to Japanese and English newspaper arhcles. Relevance Factor 1 and &amp;quot;Reeall Factor&amp;quot; R for erroneous characters detected and corrected by this method were experimentally evaluated using statistical data for 70 issues of a daily Japanese newspaper and 5 issues of a daily English newspaper.</Paragraph>
  </Section>
  <Section position="4" start_page="187" end_page="189" type="metho">
    <SectionTitle>
2 Basic Definitions and the
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="187" end_page="187" type="sub_section">
      <SectionTitle>
Method of Error Detection
and Error Correction using
2nd-Order Markov Model
2,1 Basic Definitions
</SectionTitle>
      <Paragraph position="0"> In this paper, two types of natural language's sentences are discussed. One is a Japanese sentence, which is non-segmented sentence and the other is an English sentence, which is segmented into words.</Paragraph>
      <Paragraph position="1"> A Japanese sentence can be separated into syntactic units called '%unsetsu&amp;quot;, where a {~&amp;quot; ( &amp;quot;bunsetsu&amp;quot; is composed of one m lependent word&amp;quot; and ~ sequence of n (greater than equal to 0) &amp;quot;dependent words&amp;quot;.</Paragraph>
      <Paragraph position="2"> A &amp;quot;bunsetsu&amp;quot; is a chain of Japanese &amp;quot;kanji-kana&amp;quot; characters or an English word is a chain of alphabets, and are represented by 3' = sl s2...s,~, where s~ is a &amp;quot;kanji-kana&amp;quot; character or an alphabet. In particular, a chain, 7 , is called a &amp;quot;J-bunsetsu&amp;quot; when all of its elements are &amp;quot;kanji-kana&amp;quot; characters, and is called a &amp;quot;\[iJword&amp;quot; when all of its elements are English alphabets. The set of eorre, ct .lapanese &amp;quot;bunsetsu&amp;quot; or English words is represented by Pc. Three types of erroneous &amp;quot;J-bunsetsu&amp;quot; or E-word are dehned as follows: First, a chain ce = N,C/~... s\[Zlg;.., s,;~ is called a &amp;quot;(i,k)-Erroneous J-bunsetsu or E-word Wrongly Substituted &amp;quot; ( (i, k)-EWS) if a subehain fl = tltu... Ih is wrongly substituted at the location i of ce, that is 3 7 Cre, -y = ~(o11/&lt; Here ~(Ollf3 donotes substitution of a subchain fl at, the loca-tion i in a chain c~ , that is, d01i/ 8-iS- 2 * &amp;quot;&amp;quot; Si--lllC/ 2 &amp;quot;&amp;quot; * \[,kSi-+k &amp;quot;'&amp;quot; ,S'~n , and l I 6-&amp;,&amp;quot;',tk ~-- siq~-,.</Paragraph>
      <Paragraph position="3"> Next, a chain c~ = &amp;g.,... si~_lgi ... s;,~ is called a &amp;quot;(i,k)-Erroneous J-bunsetsu or I';word Wrongly Deleted&amp;quot; ((i,~)-~WD) if a subchMn fl = t~t=...tk is wrongly deleted at the location i of a, that is ~7 ~ l'c, &amp;quot;y = c~ (1) &lt;&lt; ft. Here {,(0 &lt;&lt; fl denotes insertion of a subchain fl at the location i in a chain c, , that is, a (0 &lt;&lt; fl -- s't.sS&amp;quot;' Si~-lltl2 &amp;quot;&amp;quot; &amp;quot;lk,qi''&amp;quot; S~n. Finally, a chain cr = .C/t * &amp;quot;&amp;quot; s\[-lgi&amp;quot;&amp;quot; si-(k-1 s;+k&amp;quot;' s;,~ is also called &amp;quot;(i, k)-Erroneous J-bunsetsu or F~word Wrongly Inserted&amp;quot; ( (i,k)-EWI) if a sul)chain /3 = tlt~...tk is wrongly inserted at the location i of % that is 37 E Pc', 7 = d;) &gt;&gt; ft. tIere c~ (1) &gt;&gt; fl denotes deletion of a subchain f3 at the loca-tion i in a chain c~ , that is, at0 &gt;&gt; fl = .C/~'2'&amp;quot;siZlsi+~&amp;quot;'sZ, and tl = gl,'&amp;quot;,tk = Si+k-1.</Paragraph>
      <Paragraph position="4"> The set of (i,k)-EWS, (i, k')-EWD and (i, k)-EWI are represented by P(~)s', P~) and 17(1 ~) respectively. In this paper, all inputs &amp;quot;bunsetsn&amp;quot; or all inputs words to computers are assumed to belong to one of l-'c , p(k)s , P~) and 1'(1 k).</Paragraph>
      <Paragraph position="5"> Next, the meaning of detecting and correcting errors are define.d in the Nllowing.</Paragraph>
      <Paragraph position="6"> The words, &amp;quot;error detection problem&amp;quot;, means the problem how to detect the location i of error in if, and &amp;quot;error correction problem&amp;quot; means the problem how to replace an erroneous &amp;quot;d-bunsetsu&amp;quot; or an &amp;quot;E-word&amp;quot; ~v by a correct &amp;quot;bunsetsu&amp;quot; or an English word 7, where s ,aEP , oreeEP aud'rcic.</Paragraph>
      <Paragraph position="7"> &amp;quot;Relevance Factor&amp;quot; p(D) and &amp;quot;Recall l)'actor&amp;quot; R (D) for tile &amp;quot;error detection problem&amp;quot; is defined as follows: \]): p(D) at ( the number of &amp;quot;J-bunsetsu&amp;quot; or 2 ' '~ &amp;quot; t,. of the \],-word location i and length k error ill I '(k) p(k) \]7,5k) .s' , &amp;quot;n or is correctly detected q he total number of J-lmnsetsu or B- )/ ,, . . ,, word detected as erroneous ,l-bunsetsn or &amp;quot;E-word&amp;quot;).</Paragraph>
      <Paragraph position="8"> (2): R. (D) ~_ ( the ,mmber of &amp;quot;a-bunsetsu&amp;quot; {{ ~ &amp;quot; * . or l~-word that the location, and length k of error in P!s. ~), p(k) p~k) ~ D or is correctly detected ) / ( the number of all &amp;quot;3-bunsetsu&amp;quot; or &amp;quot;Ewo,d&amp;quot; in t,,e set ) o,, :' )p,,epared in advance ).</Paragraph>
      <Paragraph position="9"> &amp;quot;Releva.nce factor&amp;quot; p(C) and &amp;quot;Recall factor&amp;quot; R (c) for the, &amp;quot;error correction problem&amp;quot; is also similarly defined. Here p}D) denotes the &amp;quot;Relevance Factor&amp;quot; for tile &amp;quot;error detection prob- ~(k) (c) lem&amp;quot; of \] ~' , and R D denotes the &amp;quot;Recall Factor&amp;quot; for the &amp;quot;error correction problem&amp;quot; of p(k) respectively. D</Paragraph>
    </Section>
    <Section position="2" start_page="187" end_page="189" type="sub_section">
      <SectionTitle>
2.2 The Method of Error Detection using
2nd-Order Markov Model
</SectionTitle>
      <Paragraph position="0"> We introduce the following assumption according to the experiences.</Paragraph>
      <Paragraph position="1"> Assumption Each Markov probability for erroneous chains of &amp;quot;ka@-kana&amp;quot; characters or English alphabets is small compared to that  of correct chains.</Paragraph>
      <Paragraph position="2"> ! According to this assumption, the. procedure. of detecting the location i .~nd the length k of error chains arc detined as follows: Pwcedure 1 ( Method of detecting the location and the length of chain wrongly substituted in p(.k) and substituted or in.qerted in sPi~\[(l the subchMn of lelx~th k which satisfy the followin~ conditions..\['his chain is iudge~t to be wrongly inserted at the location ~.  (1) P(Xh I Sh-,~ &amp;quot;'' Z,,_t) &gt; 5&amp;quot;, r,,r I,,--i- I orh.=i+k+mand (2) r(xs I xs-,,... ,'%.&lt;) &lt; 'r, ro~. vj su,:H that i &lt; j &lt; i+ k + m- 1, where P(Xj I Xj-v,...Xj-+)is ,~+-th order Markov chain probability which denotes prol&gt; ability of occurrence of sueeessiw+ character Xj when string Xj ....</Paragraph>
      <Paragraph position="3"> * * * Xj-t has occurred, mtd X,, denotes a space symbol if u &lt; 0. And T denotes a critical  v',dne of m-th order Markov probability used for detecting errors.</Paragraph>
      <Paragraph position="4"> ri'his procedure detects that k characters a.rv. wrongly substituted or inserte(l at the. location i, if m-th order Markov probability for,cha.in remMn smMler vMue than critical wdue 1' just (k+m) times fi'om the location i to i+k+m- 1. l?or an example, the change o~ the val ~(; oI 2hal-order Markov probability for each eharac.ter of the erroneous chain \[,!~,e) or l'~ '2) is shown in l~ig. 1. In this ex~tmph{, \[wo charaet,ers are wrongly substituted or inserted. According to the previous assumption, 2nd-order Markov probability for erroneous~:!tain remain smaller value than eriticM value l just four tinms. S~ Sz $3 S,, Ss S~ Sv Sa</Paragraph>
      <Paragraph position="6"> L----X_---~ l' (S~ISJS~) &lt; T 'O; Enoneous chalacter \[_ X j l' (S6\[SISs) &lt; T X : \] ecation o\[chatacler which has tile ~--J I' (S~\[q~&amp;quot;;6~ &lt;T value n\[ Ma~kov probability smaller than '\[' L ...... J T: Clitical value of Markov probability 1' (S81,S:g, S7 ) &gt;T Fig.1. Change ofthewflueof2nd-ordcrMarkovprobabilltlcs l'roee, dure 2 ( Method of detecting the location of chain wrongly deleted in 1.'52 ~) ) Find the, subchain of length k which Satisfy the following conditious. 'Fhis clmin is judged t,o /)e wrongly deleted ~t the. location i.</Paragraph>
      <Paragraph position="7"> (~) r(x,, I~ ,,-.,,, &amp;quot; x,~_,) &gt; I, to,. h = i-l or h.=i+k+'m, and (2) e'(X# I Xj ..... -..Xj_,) &lt; '/', for V# such  that i&lt;j &lt;i-t-'m-- 1, whe.re 7' denotes a critical value of 'n&gt;th order Markov i.'obMfility used for detecting errors.</Paragraph>
      <Paragraph position="9"> remMn smaller than the critical wdue 7' just m tinms from the location i Lo i + m-1, it is judged that some characters are wrongly (teleted at the location i. l\[owever note that length k of characters wrongly deleted at the location i, can not be, de.termined by this procedure, the length k is determined by the proc(,dure 4 shown in Sec. 2.3.</Paragraph>
      <Paragraph position="10"> Table \] shows that the relation of times that Markov prolm.bilities remain slnaller than 7' in the cases of Ist- aml 2nd-order Markov models. li'rroneous (:hains (:an I)e classified into the following two eases: on('. is a case of the eh;.mmters wrongly substituted or inserted, the other is a class of the eha.racl.ers wrongly deleted.</Paragraph>
      <Paragraph position="11"> Table I' q'he mnnber of times that Markov l)rol)al)ility of the erroneous chains remain a  for each character of the erroneous rain,t: inch~dhu: ........................... Wt'ongly substltutcd or inserted chalactcrs lln case to (h+LecL errors in P~ 2) using 2rid-order Markov model, it is able to presunw.d Lhal; a sul)chM,L # of length '2 is wrongly inserted at I,h(,. location i of erroneous ('\]tah~ (~, if 2n(l&lt;n'der Mar\]coy prol)al)iltry for erroneous chain ~v remMn smalhw than .'/' just four times from location i.</Paragraph>
      <Paragraph position="12">  However, this method can not distinguish the erroneous characters wrongly substituted, from the characters wrongly inserted in the former c~e~ and can not determine the length k for the type of 1?~ ), because the Markov probability of any erroneous chmns in Pl) ~) remMns small value just the same times for length k. These problems can be solved by the procedt/re 3 and 4 shown in Sec.2.3.</Paragraph>
      <Paragraph position="13"> In this paper, the effect to detect errors for cases of length k = 1, 2 is evaluated.</Paragraph>
    </Section>
    <Section position="3" start_page="189" end_page="189" type="sub_section">
      <SectionTitle>
2.3 The Method of Error Correction us-
</SectionTitle>
      <Paragraph position="0"> ing 2rid-Order Markov Model The procedure of replacing erroneous chains by correct chains using Markov model is presented as follows:  substituted or inserted at the location i of cY respectively. Then the erroneous chain ae can be replaced by the following correct chain &amp;quot;y in  assumed to be wrongly delete(t)at the location i of c~. Then the erroneous chain c~ can be replaced by the following correct chain 7 in</Paragraph>
      <Paragraph position="2"> An example of correcting the erroneous chain, two characters of which are wrongly substituted (P(~) ), is shown in Fig. 2. If Markov probabilities do not remain smaller than critical value T, then it is judged that these erroneous chains have been corrected.</Paragraph>
      <Paragraph position="4"> Choose the candidate of &amp;quot;bunsetsu&amp;quot;,which has a great Markov probability in two cases Fi9.2 Procedure for correcting an erroneous string using error detection</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="189" end_page="197" type="metho">
    <SectionTitle>
3 Experimental Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="189" end_page="190" type="sub_section">
      <SectionTitle>
3.1 Experimental Conditions
</SectionTitle>
      <Paragraph position="0"> \]. The number of &amp;quot;bunsetsu&amp;quot; for 70 issues of a daily Japanese newspaper: 283,96:~ ~bnnsets\[l&amp;quot;  2. The number of words for 5 issues ofadaily English newspaper: 155,,159 wor(Is 3. Type of errors and the numt)(,r of &amp;quot;bnnse{sll&amp;quot; : 8(10 &amp;quot;bunsetsu&amp;quot; are prepared for each of l:!,!), rS? ) l'i and l'? (a) The average length of &amp;quot;bunsetsu&amp;quot; composed of &amp;quot;kanji-kana&amp;quot; character chmns: 6 characters (b) The twera,ge length of alphabets composed of correct English words chains : 7 characters 4. Markov model of Japanese &amp;quot;kanji-l{ana&amp;quot; characters : 2nd-order Markov Model 5. Markov models of Fnglish alphabets: 2nd- and 3rd-order Markov models</Paragraph>
    </Section>
    <Section position="2" start_page="190" end_page="197" type="sub_section">
      <SectionTitle>
3.2 Experimental Results and Discussion
</SectionTitle>
      <Paragraph position="0"> The accuracy of error detection ~md error correction depends on the critical va,lue 7' of Markov proba, bilities. &amp;quot;Rehwance Factor&amp;quot; P and &amp;quot;ReaM1 Factor&amp;quot; R, for e;tch method were obtained by changing the wdue of T.</Paragraph>
      <Paragraph position="1"> \[1\] The Relation between P and R of Detectmg Erroneous ChMn Using \])eteetion Proee~ dure 'Phe relation between P and R for the loca-tion of erroneous &amp;quot;k,~nji-kanPS' chains det, ecl.ed in p(t). s' , P(~)s, .,P(t)., ..\]'(~), 1'~ t), ;rod l'~ ~) using Procedure 1 a,Ild 2, are. S\]lOWll ill Fig. 3, ;t.lI{{ those for erroneous Mp}utbets chains ;u:e shown in Fig. 4.</Paragraph>
      <Paragraph position="2"> From these figures, the following results are  obtained : 1. The maximum wdue of P and R of detectins erroneous characters wrongly inserted or substituted, is greater than that of erroneous characters wrongly deleted'.</Paragraph>
      <Paragraph position="3"> (a) In the case of &amp;quot;J-bunsetsu&amp;quot; :</Paragraph>
      <Paragraph position="5"> (b) In the case of 'q~word&amp;quot;:</Paragraph>
      <Paragraph position="7"> 2. Compsred with the, se maximM wdues, it iS shown tha% the Irla, xilnuin va, hle o\[ i)ro(\] uct of P and R for &amp;quot;k~nji-kmu~&amp;quot; %unsetsu&amp;quot; ix 35%--60% greater than that of English words.</Paragraph>
      <Paragraph position="8"> \[2\] The Relation between l ) and .R of Cha.ins Corrected Using Correction l'rocedure  The relation betwee, n 1&amp;quot; ~nd IC of &amp;quot;,lbunsetsn&amp;quot; corrected using Procedure 3 and 4 for p(t) -p(2) pO),(2) F~l.), P~ 2) of &amp;quot;.\]-bunsetsu&amp;quot; are shown ill Fig. 5. From this tigure, the following results ~l'e obtained : The maximum wduc of P and \]Z of correctins erroneous etum~eters wrongly inserted or substituted, using 2nd-order M~u'kov model, is greater thcnn that of erroneous cluu'acters wrongly deleted.</Paragraph>
      <Paragraph position="10"> Fig.3. Experireenlal resuRs for detecting a location of an erroneous &amp;quot;kanji-kana&amp;quot; string using the error detection procedure</Paragraph>
      <Paragraph position="12"> ____ l .... I_ __t __ l L 0 2 0 ,I 0 610 Recall lacier \[%\] Fig.4. Experienlal resulls Ior delecling a localion of an erroneous IPSngli.&amp;quot;Ji words using Ihe error doleclion procedure</Paragraph>
      <Paragraph position="14"> I:ig.5. Expelimenlal results (or correcting an erfolleous &amp;quot;kanji-kana&amp;quot; siring using error correction procedure  The experimental results of detecting errors in English words using Ispell ( Interactive Spell checker ) is shown in Table 2. l?rom the results, it is seen that Ispell cart almost perfectly detect erroneous words in U~, I'~) and P s. using dictionary, but it cannot perfectly cur rect erroneous words, because it can output the correct candidates for erroneous words in p~), r(~), pO)s, but can not output the correct candidates for erroneous words in F~ ~), P(~) ~1) 1 F~ ~). It is necessary todetect the locationoferroneous alphabets in words to detect MI these errors. However, it should be noted that \[spell can not detect the location of erroneous alphabets in words.</Paragraph>
      <Paragraph position="15"> In order to detect and correct erroneous &amp;quot;E-word&amp;quot; more effectively, the method to combine Ispell and the procedure (in see. 2.3 ) using Markov model is expected* The combinatorial method is denoted in the following way: (1) At first, erroneous &amp;quot;E-words&amp;quot; are detected by Ispell, but the locations of erroneous alphabets in words can not be detected by it. (2) Next decide the correct candidates words by procedure 3 and 4. (3) Finally, ls!)ell again checks if these candidates are correct words.</Paragraph>
      <Paragraph position="16"> The experimental results using this method is shown in Fig. 6(2nd-order) and in Fig. 7( 3rd-order ). From the results, it is seen that this combinatorial method of Ispell and the procedure by 3rd-order Markov model to very useflfl to detect and correct all errors in English words.</Paragraph>
      <Paragraph position="17"> It takes about 10 milli-seeonds and 6 seconds in average to detect and to correct erroneous &amp;quot;bunsetsu&amp;quot; . Examples of &amp;quot;bunsetsu&amp;quot; and the output results of error de\]cotion and error correction using Mm!kov model, are shown in Fig. 8.</Paragraph>
      <Paragraph position="18">  * Output result (crror lx)shion) of error detcclion : first character * Oulpul tcsull (cogccl bunsctsu) of error coefcclion : ~I~..~.. A~,~,,.'~ (b) Case of an crroccorls &amp;quot;kanji-ka\],,'~&amp;quot; &amp;quot;bu,rso st,&amp;quot; for FD (' \]:\]g.8. Examp\]cs of cn'oncous &amp;quot;buaselsu&amp;quot; and the resulls of relot dctcctlon and error correclion</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="197" end_page="197" type="metho">
    <SectionTitle>
4 Conclusion
</SectionTitle>
    <Paragraph position="0"> This l),~per proposed the methods to .ittdge three type of errors mM correct these errors, which are characters wrongly substituted, inserted ~nd deleted in the .l~panese &amp;quot;ka.njikmt,~&amp;quot; chains and English words using m-th order Marker model.</Paragraph>
    <Paragraph position="1"> The effects of the methods were experimentally ev;dnated for the case of 2nd- and 3rd-order M~rkov chain. ~'rom the exI)erimental results, the following conclusions have been obt;dned: 1. The m;~ximum vahte of P ;rod .le of detecting erroneous ch~racters wrongly inserte&lt;l or substituted, is greater than that of erroneous ehm'aeLers wrongly deleted.</Paragraph>
    <Paragraph position="2"> 2. This method is specially useful to detect ~md correct erroneous characters wrongly inserted att(l substituted in &amp;quot;k~mji-l~a,n~ ~' &amp;quot;bunsetsu&amp;quot;, but is not so useful 1.(; detect. and correct errors in English words.</Paragraph>
  </Section>
  <Section position="7" start_page="197" end_page="197" type="metho">
    <SectionTitle>
3. The combin,~toriM method of lspell a.nd
</SectionTitle>
    <Paragraph position="0"> the procedure by ard-order M arkov model is usefull to detect and correct all errors in Fmglish words.</Paragraph>
    <Paragraph position="1"> llowever they are not so usefltl for detecting and correcting of eharactells, wrongly deleted in &amp;quot;k,~I\ji-kana&amp;quot; &amp;quot;bunsetsu&amp;quot;. 1: hen, m(&gt;re e.flicient rrmthods are expected for this type of errors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML