File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1114_intro.xml
Size: 3,041 bytes
Last Modified: 2025-10-06 14:06:34
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1114"> <Title>Large Scale Collocation Data and Their Application</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Word processors or computers used in Japan ordinarily employ Japanese input method through keyboard stroke combined ~ with Kana (phonetic) to Kanji (ideographic, Chinese) character conversion technology. The Kana-to-Kanji conversion is performed by the morphological analysis on the input Kana siring with no space between words. Word- or phrase-segmentation is carried out by the analysis to identify the substring of the input which has to be converted from Kana to Kanji. Kana-Kanji mixed string, which is the ordinary form of Japanese written text, is obtained as the final result. The major issue of this technology lies in raising the accuracy of the segmentation and the homophone processing to select the correct Kanji among many homophonic candidates.</Paragraph> <Paragraph position="1"> The conventional methodology for processing homophones have used the function that gives the priority to the word which was used lastly or to the high frequency word. In fact, however, this method sometimes tends to cause inadequate conversion due to the lack of consideration of the semantic consistency of the word concurrence. While it is difficult to employ the syntactic or semantic processing in earnest for the word processor from the cost vs.</Paragraph> <Paragraph position="2"> performance viewpoints, for example, the following trials to improve the conversion accuracy have been reported: Employing the case-frame to check the semantic consistency of combination of words \[Oshima, Y. et al., 1986\]. Employing the neural network to describe the consistency of the concurrence of words \[Kobayashi, T. et al.,1992\], Making a concurrence dictionary for the specific topic or field, and giving the priority to the word which is in the dictionary when the topic is identified \[Yamamoto, K. et al., 1992\]. In any of these studies, however, many problems are left unsolved in realizing its practical system.</Paragraph> <Paragraph position="3"> Besides these semantic or quasi-semantic gadgets, we think it much more practical and effective to use surface level resources, namely, to use extensively the collocation. But how many collocations contribute to the accuracy of Kana-to-Kanji conversion is not known yet.</Paragraph> <Paragraph position="4"> In this paper, we present some results of our experiments of Kana-to-Kanji conversion, focusing on the usage of large scale collocation data. In chapter 2, descriptions of the collocations used in our system and their classification are given. In chapter 3, the technological framework of our Kana-to-Kanji conversion systems is outlined. In chapter 4, the method and the results of the experiments are given along with some discussions. In chapter 5, coneluding remarks are given.</Paragraph> </Section> class="xml-element"></Paper>