File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2104_intro.xml

Size: 2,731 bytes

Last Modified: 2025-10-06 14:05:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2104">
  <Title>A Portable &amp; Quick Japanese Parser : QJP</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Natural language parser/analyser is essential for allowing advanced functions in document processing systems, such as keyword extraction to characterize a text, key-sentence extraction to abstract a document, grammatiea\] style checker, information or knowledge retrieval, natura\] lmlguage understanding, naturai language interface and so on. But a general purpose parser requires 1) a laxge dictionary database with more than several tens of thousands words, 2) advanced techniques for disambiguation and processing semantics, aald 3) substantial machine resources, such as a lot of memory and high speed CPU.</Paragraph>
    <Paragraph position="1"> In addition, users must mMntain additional terms in dictionaxies for specialized fields. As a result, most parsers cannot be easily used in applications and it is difficult to develop a practical parser which can be easily integrated into many applications.</Paragraph>
    <Paragraph position="2"> We changed our viewpoint in order to design and develop aal applicable and usable Japanese parser.</Paragraph>
    <Paragraph position="3"> First, we focused on the unique sets of character-types in written Japanese and constructed a very small dictionary using mainly functional words in hiragana-chm'acter. Similar approaches\[i\]\[2\] were used for segmentation or preliminary morphological analysis about 20 years ago, using the transitionpoint between types of ehaxaeter sets to cue word segmentation. Second, we noticed that dealing with syntactic ambiguities creates a large processing burden and even using semantic information does little to assist syntactic analysis at the current level. So we either simplified dealing of structural ambiguities or ignored semantics to lighten the syntactic processing.</Paragraph>
    <Paragraph position="4"> We first created a prototype of our parser\[3\] using AWK language, and then rewrote it \[4\] in C so it could be included in applications. The resulting parser, named QJP, is portable, fast and robust. It is an effective parser for many general purpose applications, despite of a dictionalT size of only 5 thousand words. It can analyze a 100-word sentence on a PC in less than one second, while using less than half of a megabyte of memory. In addition, it requires no further dictionaxT maintenance for new terms .</Paragraph>
    <Paragraph position="5"> In this paper we describe the QJP's analysis methods and report on its current performances.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML