File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2157_metho.xml

Size: 3,687 bytes

Last Modified: 2025-10-06 14:12:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-2157">
  <Title>Colloeational Analysis in Japanese Text Input</Title>
  <Section position="3" start_page="770" end_page="772" type="metho">
    <SectionTitle>
4. Translation Algorlthm
</SectionTitle>
    <Paragraph position="0"> Fig.3 shows the translation process outline. First, tablesearching is done for all segmentation possibilities to get each part-of-speech of segment. This' is carried out referring to independent word *dictionary (nouns, verbs, adjectives, etc.</Paragraph>
    <Paragraph position="1"> \[65,000 words\]), prefix and suffix dictionary \[1085 words\], dependent word dictionary (postpositions, auxiliary verbs, etc.</Paragraph>
    <Paragraph position="2"> \[422 words D, Then, among the morpheme sequences constructed with each segment, the grammatically possible sequences are selected.</Paragraph>
    <Paragraph position="3"> Next, the candidate sentences with the least number of Bunsetsu are selected \[Yoshimura83\]. Furthermore, among tt~ese selected sentences, those which have the least number of words are selected. In this process, a heuristic rule is used to prevent morPheme sequence mis-selection. This rule rejects the combinations of nouns constructing a compound word, if the usage frequency of either nouns is very low.</Paragraph>
    <Paragraph position="4"> ex, Input Kana sequence: 75~/b I;~ ~ 03 ~2 ~ \[kankeinonaka\] x ~ (noun) ~ (noun, freq. : very low) (a relation) (in a field) O ~ (noun) o) (postposition) OO (noun, freq. : high) (a relation) (among) Secondly, the co-occurrence pattern matrix is utilized in order to determine the number of WCP within each candidate sentence. The counting operation is carried out only on adjacent Bunsetsu, because , in most eases, relationships are between adjacent Bunsetsu and determining extended relationships would prove to be too time-consuming.</Paragraph>
    <Paragraph position="5"> Finally, the cand{date sentence with the maximum WCP number is chosen as the prime candidate. To prevent mistaken deletion of prime candidates caused by word pairs which rarely co-occur, following rule is used. If the usage frequency of either word in WCP is low, the,candidate sentences of which WCP number is less one than maximum number, are also identified as prime candidates. In following example, both are identified as prime candidates.</Paragraph>
  </Section>
  <Section position="4" start_page="772" end_page="772" type="metho">
    <SectionTitle>
5. Translation Experimental it~es-ifl~
</SectionTitle>
    <Paragraph position="0"> About four hundred test sentences were used '.';o evahm.tc the accuracy of eollocational analysis. The mean m/robe,: ~,{; candidate sentences was 62.6, selected by considering !cu~.</Paragraph>
    <Paragraph position="1"> number of Bunsetsu. Error ratio for ehis was 1,7%. ~,;~rox ratio means the proportion of correct Hunsetsu mi.'~sc~&amp;quot;t !:.;, the selecting operation in each process to total nm~iber o( ~,.I} Bunsetsu. The mean number of candidate se~tencc.~ ~c!ee~cd by least number of words was 1.6.1 with a~i erJ:or r:,~i:'~ ~S 0.8%. Finatly, the nmnber d candidate sentences selected by collocational analysis method was thrther reduced to 6.4 wil;b an error ratio of 1.6%, Furthermore, translation accuracy of the praci;ica( tr~,~a.~',l;~. tion algorithm based on the above description was c'xanfi~led using 10 leading articles in news papers(about 14,000 clm~!&gt; acters). This practical algorithm was modified J))r proce~.,;i~.~.g proper nouns, numerals and symbols, a~M to sa~e memory It was confirmed that the translation accuracy evaluated by character unit of this method was higher thaxt 95%.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML