File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-2004_metho.xml

Size: 3,548 bytes

Last Modified: 2025-10-06 14:11:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-2004">
  <Title>IUniv, New Hampshlr~Univ. New \]-\]ampshlr~ ICOX P@.cox P~ IHARVEY P@HARVEY P~ or~n~ t~t~e</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DEVELORgENT OF BASIC PRACTICAL TECHNIQUES YOR JAPANESE LETTER
STRING PROCESSING - AU~0NATIC KEYWORD EXTEAC~ION AND AUTOMATIC
READING
</SectionTitle>
    <Paragraph position="0"> K. Arak/, K. Hinatou, K. ltaysma, T. Sahara, Y. Sakagamt and F. Takano T~e Japan Information Center of Science and Technology (JICST) 2-5-2 NIgataoho Chiyodak~ Tokyo 100 Japan Japanese is a peculiar language among the thousands of languages in the world. There exist only two of the same class! Japanese and Korean. Japanese is written both in Chinese characters (ideograph) and in Kana (Katakana and HAragana phonetlo symbols) In mixture without any space. Moreover, 0hineee characters in Japanese have, in most cases, several readtngs and play several roles depending on the context and letter string characteristics. So for written Japanese, it wan very difficult to ee~nent letter string and extract adequate terms from sentence and to gave them correct readings automatically, which a~e indispensable for terminology, automatic reading, automatic indexing, key-boarding from on-line terminals otherwise more than 2,000 character key-board As necessary.</Paragraph>
    <Paragraph position="1"> The authors invented efficient algorithm and :developed computer programme and dictionaries for successful solution of the problems above for the first time in Japan.</Paragraph>
    <Paragraph position="2"> The system consists of two subsystems called K-KACS (Kan~i-Kana Automatic Conversion System) and JAEAS (Japanese Keyword Automatic Selection).</Paragraph>
    <Paragraph position="3"> Some Chinese characters act both as suffix, preffix or preposition and as parts of meaningful words. We comprehensive- 21 ly collected such characters (about 500) and those terms in Which the characters are included not as fixes or prepositions but as important part (about 8000 words). Letter string which is matched with dictionary term is passed but the letter remained and coincides with the special character itself is cutv In case of long letter string without such special letter, sentence is out by those terms of dictionary which are thought to be definite within reasonable amount. That iss dog liver nucleus DNase indefinite type of word. definite type of word.</Paragraph>
    <Paragraph position="4"> Equally, among the varlety of readlngs - in some oases more than 8 - some are speelel and definite and others ~re indefinite but obey to rules. We collected these speolal readings (about 25,000) for about 2,000 Chinese characters and developed algorit~ and progr~e to select the correct readlng for each Chinese character with the precision higher than 99.94 ~.</Paragraph>
    <Paragraph position="5"> As the dlotionary is small enough and lo~Io is simple, implementation and meintenanoe are relatively easy and the speed ls very hlgho JICST adopted this system for its information file production and services of more than 400,000 citations per ye~T and save cOStSdeg By the development of the techniques, p~ocessiug of Japanese has become to be able to cope with western lan6uages, We were awarded for the work The Prize of Learning of Japan Association of Information and Documentation in 1980, and have applied patent (Japan Patent KoksPS Shows 55 (1980) - 102074).</Paragraph>
    <Paragraph position="7"/>
  </Section>
class="xml-element"></Paper>
Download Original XML