File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/86/c86-1067_intro.xml
Size: 3,549 bytes
Last Modified: 2025-10-06 14:04:33
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1067"> <Title>A Kana-Kanji Translation System for Non-Segmented Input Sentences Based on Syntactic and Semantic Analysis</Title> <Section position="2" start_page="0" end_page="280" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Ordinary Japanese sentences are written using a combination of Kana, which are Japanese phonogramic characters, and Kanji, which are ideographic Chinese characters. Nouns, verbs and other independent words are generally written in Kanji. On the other hand, dependent words such as postpositons, and auxiliary verbs, etc., are written in Kana. While there are about fifty Kana, there are several thousand Kanji, thus making it difficult to input Japanese sentences into a computer system.</Paragraph> <Paragraph position="1"> Extensive research has been carried out on methods of inputting Kanji in an attempt to .realize rapid and easy input. Among the methods proposed, Kana-Kanji translation appears to be the most promising. In this method, input sentences are entered in Kana using a conventional typewriter keyboard, and those parts of the sentences which should be written in Kanji are translated into Kanji automatically. In this process a non-segmented input form is desirable for users because there is no custom of segmentation in writing Japanese sentences.</Paragraph> <Paragraph position="2"> Therefore, the ultimate goal of a Kana-Kanji translation scheme should be to achieve error-free translation from non-segmented Kana input sentences.</Paragraph> <Paragraph position="3"> This paper describes a system for achieving high accuracy in the Kana-Kanji translation of non-segmented input kana sentences.</Paragraph> <Section position="1" start_page="0" end_page="280" type="sub_section"> <SectionTitle> 1.1 Disambiguation Approaches in Kana-Kanji Translation </SectionTitle> <Paragraph position="0"> If ambiguity were not a problem in non-segmented input Kana sentences, a perfect Kana-Kanji translation could be easily made using simple transliteration techniques. The fact is that the input Kana sentences are highly ambiguous. The ambiguity of non-segmented input Kana sentences can be categorized into following two types.</Paragraph> <Paragraph position="2"> Makino and Kizawa \[1\] proposed an automatic Kana-Kanji translation system in which these two types of ambiguity are treated separately in different ways: The segmentation of input sentences is carried out heuristically by the longest string-matching method of two &quot;Bunsetsu&quot;. A Bunsetsu is a Japanese syntactic unit which usually consists of an independent word followed by a sequence of dependent words. After determining the segmentation of a sentence, suitable words are selected from the homonym set based on a syntactic and semantic analysis. In their approach, the ambiguity of the segmentation is treated without using syntactic and semantic analysis.</Paragraph> <Paragraph position="3"> The new Kana-Kanji translation method presented in this paper treats both types of ambiguity in the same way based on a syntactic and semantic analysis. In the new method, translation is performed in two steps. In the first step, the both kinds of ambiguity are detected by morphological analysis and are stored in a network form. In the second step the best path, which is a string of morphemes, is chosen from the network by syntactic and semantic analysis based on the case grammar.</Paragraph> </Section> </Section> class="xml-element"></Paper>