File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1050_metho.xml
Size: 13,915 bytes
Last Modified: 2025-10-06 14:07:10
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1050"> <Title>Kana-Kanji Conversion System with Input Support Based on Prediction</Title> <Section position="4" start_page="341" end_page="342" type="metho"> <SectionTitle> 2 Example of Text Input </SectionTitle> <Paragraph position="0"> Figure 1 shows an example of text input using tile proposed system. Suppose that tile user intends to input a sentencc , ,,,v,~,~ 2- co -e ~&quot; e$ & N ~ ~ ~ ~- (we rcqucst yore&quot; attendance at the following meeting)&quot;, typing ha.ha /u b ~ 9 ~ ~ ~. ~- (katcino kaigiwo kaisai shimaswnodc gosanshuu ncgaimasu)&quot;. When the user types &quot;\]~' (ha)&quot;, &quot;~ (hi)&quot;, and &quot;CO (no)&quot; keys, the system antomatically opens a prediction menu window just below the typed charactcrs, and shows two candidates in the menu window (Fig.l(a)): (at the following address, modest .... ) (al. the following address, modest .... ) The first candidate is high\]ighted. If the menu window contains an appropriate candidate, the user can choose it by cursor; otherwise the user can continue entering the next characters. Subsequently, when &quot;\]0, (lea)&quot; key is typed, the predic.tion menu window disN)pears (Fig.l(b)).</Paragraph> <Paragraph position="1"> When &quot;~ (i)&quot; and &quot;~&quot; (gi)&quot; keys are typed, the system automatically opens a prediction menu window again, and shows %ur new candidates (Fig.l (e)): (we request your a.ttenda.nce a.t the meeting) (we hold the meeting) (we hold the meeting) (we hold the meeting) Here the first one is what the user just wants; the user enters select key, thcn the prcdiction menu window disappears, and dmsen candidate is insertcd in the cdit area. If remaining tcana charactcr string wtfich was not included in the chosen cundidate exists, l~a,na-l~anji conversion starts automatically; the first three Icana c.haracters of this sentence &quot;?a~ ~ ~ (l~ahino)&quot; is converted to tcanji notation 'q'~,co (the following)&quot; (Fig.l(d)). This is the first result of t~ana-kanji conversion, so that the user can d~ange it to others. An overline of thc conversion rcsult in Fig.l(d) shows that this result is not fixed yet.</Paragraph> <Paragraph position="2"> In above example, while 27 ~ana d~aracters are needed to input in ordinary l~ana-t~anji con- null word processor with input support. (a)&quot;;~J ~'', &quot;-~&quot;, and &quot;03&quot; keys are typed. (b)&quot;~ ~'' key is typed, subsequently. (e)&quot;~ ~'' and &quot;{s'&quot; keys are typed, subsequently. (d)The first candidate in (c) is chosen. version, our system can reduce the input of 21 ~:ana characters, &quot;*L ~b' ~ ~ t~ ~ ~ b ~ J-c0 ~ &quot;~&quot; ~ \]u b e~ 9 talo~ ~ ~ J- (wo kaisai shimasunode gosanshuu negaimasu)&quot;; only 6 kana characters are needed to input.</Paragraph> </Section> <Section position="5" start_page="342" end_page="346" type="metho"> <SectionTitle> 3 Input Support Method Based on </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="342" end_page="342" type="sub_section"> <SectionTitle> Prediction </SectionTitle> <Paragraph position="0"> In this section, an overview of the system is shown. Then dictionaries used in the system, factors for estimation of candidates, and user learning are described.</Paragraph> </Section> <Section position="2" start_page="342" end_page="342" type="sub_section"> <SectionTitle> 3.1 Overview of the system </SectionTitle> <Paragraph position="0"> Figure 2 shows a diagram of the proposed system. It is composed of a kana-hanji conversion unit and au input prediction unit, and the latter has following four sub-units: by the user, and automatically registers them into the user dictionary.</Paragraph> </Section> <Section position="3" start_page="342" end_page="343" type="sub_section"> <SectionTitle> 3.2 Prediction Dictionary </SectionTitle> <Paragraph position="0"> Two kinds of dictionaries are used as a predic- null tion source: (i) System Dictionary consists of high frequent phrases.</Paragraph> <Paragraph position="1"> (ii) User Dictionary consists of phrases learned fl'om texts which the user typed before. Ead~ dictionm'y includes words and phrases without distinction. This is because Japanese is not written separately by words, and high fl'equent phrases consist of various grammatical forms, sud~ as single word or two words. And eadt entry has kana notation (phonetic script) and kanji one.</Paragraph> </Section> <Section position="4" start_page="343" end_page="343" type="sub_section"> <SectionTitle> 3.3 Estimation of Prediction Candidates </SectionTitle> <Paragraph position="0"> Two kinds of factors are used to estimate candidates: null (i) Certainty Factor indicates how certain a candidate is.</Paragraph> <Paragraph position="1"> (ii) Usefulness Factor indicates how useful a can- null didate is.</Paragraph> <Paragraph position="2"> These two factors vary as the user inputs a character. Retrieval results are sorted in order of these factors, and only ones with greater factors than thresholds are shown as candidates.</Paragraph> </Section> <Section position="5" start_page="343" end_page="344" type="sub_section"> <SectionTitle> 3.4 Calculation of Certainty Factor </SectionTitle> <Paragraph position="0"> Certainty factors fox&quot; entries in the system dictionary and the user dictionary are calculated in different manner.</Paragraph> <Paragraph position="1"> First we make some notational conventions.</Paragraph> <Paragraph position="2"> A typed kana character string is denoted by S, which has right sub-strings S; (1 < i < L(S)).</Paragraph> <Paragraph position="3"> L(x) is the length of a string x. An entry in the dictionary is denoted by W, which has kanji notation WH and kana notation Wy.</Paragraph> <Paragraph position="4"> where F~:(WH) is the frequency of WH in kanji notation corpus, and FK(&) is the fl'equency</Paragraph> <Paragraph position="6"> Ccrl, ai'n, ty .f actor(ia' ~ &quot;?~-~:~~ I io' re: &quot;h' ) = 70/114 = 0.614 The values of certainty factor corresponding to every character sub-strings are described in tim system dictionary, and are read out at retri eval.</Paragraph> <Paragraph position="7"> Since t.he system cannot infer which phrases would be registered into the user dicl, ionax'y, calculation of certainty factor \['or an entry in the user dictionary from corpora may 10e impossible. Hence, when S is typed, certainty factor fox' W in the user dictionary is calculated as follows: Certainty f actor (W IS) = O, when S has a right sub-string & which partially m attires with the head of Wy otherwise where N(Si) is the number of entries whose kana notations start from Si in the user dictionary, and c~ is a constant to give greater factor tbr entries in tile user dictionary than that in the system dictionary; i.e., tile user dictionary has priority.</Paragraph> </Section> <Section position="6" start_page="344" end_page="344" type="sub_section"> <SectionTitle> 3.5 Calculation of Usefidness Factor </SectionTitle> <Paragraph position="0"> An increase in tile length of typed tca'n,a d~m'acter strings raises the certainty on prediction, but lessens the usefulness. Hence, 'us@d'n, css factor is introduced in addition to certain, ty factor. When S is typed, &quot;us@tl'ness factor for W is calculated as tbllows:</Paragraph> </Section> <Section position="7" start_page="344" end_page="344" type="sub_section"> <SectionTitle> of Wy otlmrwise 3.6 User Learning </SectionTitle> <Paragraph position="0"> Aft, re' the user adopts prediction or t:a~ta-ha.n:ji conversion cmldidates, words wit;ll longer length than threshold and phrases which sat, isfy given grammatical conditions are extracted; these are automaticMly registered int, o the user dictionm'y. null Por example, suppose, that the user intends (;() input a phrase &quot;&quot;&quot;*'~ mcet,ing)&quot;, typing 1,:a'na chm'acl, ers &quot;#' v~g V- b r;,o ~ ~ g- ~ (ko.igi'ni sh,'ussetci sn, ru,)&quot;. When &quot;#' (ka)&quot;, %~ (i)&quot;, and &quot;~&quot; (gi)&quot; keys ~we typed, \['om' candidates are sllowu in the predict.ion metal window (Fig.l (c) ). IIcre tlm predict;ion menu window dose not cont:ain a candidate wlficll the user wants, the, ll tlm user conl;imms entertug the next; l~:a'n,a characters &quot;k b r;,v ~ ~ 3- ,5 ('hi sh'ussehi s'uru,)&quot; and l,:a'n,a,-/,:an:ji cmwersion key. As a result, bb' ~ ~-~&quot; V- b r;,-o -g ~ 3- ~ (l,;a, igini ,stu~ssehi suru)&quot; is convcrt, ed to ~,aL,l+),~,3-~ (attend 31; the meeting)&quot;.</Paragraph> <Paragraph position="1"> When this conversion candidate is adopted, two words ml(t a phrase are regist, ered into the user dictionary: ~N (meeting)&quot;, &quot;/15/,'i~g-~ (ag~_~'~L ,I,))I,~Y ~ (M;tend at, the meet- tend)&quot;, ~md &quot;~ - &quot; &quot;&quot; ing)&quot;. 711&quot; &quot;7~, (lea)&quot;, %' (i)&quot;, and &quot;~&quot; ((.It)&quot; keys I I I't; ')~ are typed after this learning, &quot;~q~V-,bI, u 3-~ is contained in the prediction menu window.</Paragraph> <Paragraph position="2"> 4: Experiment, s Efficiency of the proposed system is shown by means of' experilnents.</Paragraph> </Section> <Section position="8" start_page="344" end_page="344" type="sub_section"> <SectionTitle> 4.1 Evaluation Measures </SectionTitle> <Paragraph position="0"> Neither st;art key for tn'ediction nor cancel key for prediction candidates are needed. And stleer, key to ~dopt. candidates is needed in both of t)rediction mid ordinary ha.ha-ha,nit conversion; we need not; tn~ke into account of l;he input of select; key. tIence, the length of complemented hana c, haracters is just a decrease in key input; operations. Two evaluation measures, an operation ratio and a precision, are defined as follows:</Paragraph> <Paragraph position="2"> where P is the length of the original ha'ha text;, Q is the length of ha'ha chm'acters complemented by prediction~ R is the number of sllown prediction menu windows contMning appropriate choices, and S is tile number of ~dl of shown prediction menu windows.</Paragraph> </Section> <Section position="9" start_page="344" end_page="344" type="sub_section"> <SectionTitle> 4.2 Data and Conditions </SectionTitle> <Paragraph position="0"> Two kind of texts, a paper on natm'al language processing and a let;ter, were used in ore&quot; extmriments; these l, exl, s were not included in the corpora used to calculate certainty factor: A system dictionary with 37,926 entries was used.</Paragraph> <Paragraph position="1"> Thresholds of certai'nl, ll factor mid 'us@d.ness fa, ctor were 0.1 and 2. q~he numl mr of candidates present.ed in ~ predict;ion mmm window was five or less. If ~ prediction menu window contained ml ~l)propri~t,e ('hoice~ it, was ~lw~ys adopted.</Paragraph> <Paragraph position="2"> With ~v view to ex~mdnin 8 eacl~ contrilmtion o\[' the system (tict;ionary and user learning, experiments were carried out in three cases: (i) Only the systeln dictionary was used.</Paragraph> <Paragraph position="3"> (it) Only user learning was used.</Paragraph> <Paragraph position="4"> (iii) Bot;h t\]tc syst, em dictionary mid user learn null ing were used.</Paragraph> <Paragraph position="5"> ~Ve calculated the lengt;h of complemented ha'ha du~r~,ct;ers automaticMly. An operation ratio and a precision were recorded at; every input of 4,500 lcana characters.</Paragraph> </Section> <Section position="10" start_page="344" end_page="346" type="sub_section"> <SectionTitle> 4.3 II,esults </SectionTitle> <Paragraph position="0"> Pigm'e 3 shows experimental results.</Paragraph> <Paragraph position="1"> Decrease in key input operations: Using both the system dictionary and user learning, for the paper, an operation ratio was 97.3-78.6% (line (r3) in Fig.3(a)) and a precision was 20.0 -26.7% (line (p3) in Fig.3(c)); for the letter, an operation rat;k, was 80.7 78.1% (line (r3) in Fig.3(b)) and a precision was 26.1--29.6% (line (p3) in (c)Precision for the paper. (d)Precision for the letter. Fig.3(d)). When 45,000 ka~a characters were typed, an average of the operation ratio was 78%, that is, a 22% decrease in the original operations was obtained; an average of the precision was 25~)~, that is, a quarter of shown prediction menn windows contained appropriate choices. This precision was enough to realize comfortable operations.</Paragraph> <Paragraph position="2"> Contribution of the system dictionary: Using only the system dictionary, for the paper, an operation ratio was 97.6-96.6% (line (rl) in Pig.3(a)), that is, a 2.4-3.4% decrease in the original operations was obtained; for the letter, an operation ratio was 90.6-78.8% (line (rl) in Pig.3(b)), that is, a 9.4-21..2% decrease in the original operations was obtained. As a result, the system dictionary is effective for a text like a letter with idioms or common phrases, because the system dictionary includes a lot; of such phrases. Furthermore, eomt)ared a t)recisiou using both the system dictionm'y and user lewcning wi/;h that using only user learning, the \['orlner was worst for the paper (lines (t)2) and (p3) in Fig.ate) (d)). As a result, for some kind of texts, the system dictionary not only is ineffective but also reduces a prec.ision.</Paragraph> <Paragraph position="3"> Contribution of user learning: User learning had an effect tbv an operation ratio aI'te.v more than 9,000 ka'na chm'acters were typed (lines (r2) in Fig.(a)(b)). In fact, if the user types about ten pages of texts, a 15 20% decrease in the original ot)erations can be obtained.</Paragraph> </Section> </Section> class="xml-element"></Paper>