File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/a00-1027_concl.xml

Size: 1,565 bytes

Last Modified: 2025-10-06 13:52:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-1027">
  <Title>Compound Noun Segmentation Based on Lexical Data Extracted from Corpus*</Title>
  <Section position="6" start_page="202" end_page="202" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this paper, we presented the new method for Korean compound noun segmentation. First, we proposed the lexical acquisition for compound noun analysis, which consists of the manually constructed segmentation dictionary (HBSD) and the dictionary for applying the segmentation algorithm (SND). The hand-built segmentation dictionary was made manually for compound nouns extracted from corpus. The simple noun dictionary is based on very frequently occurring nouns which are called distinct nouns because they are clues for identifying constituents of compound nouns. Second, the compound noun was segmented based on the modification of CYK tabular parsing and min-max composition, which was proven to be the very effective method by experiments. The bottom up approach using min-max operation guarantees the most likely segmentation, being applied in the same way as dynamic programming. null With our new method, the result for segmentation is as accurate as 97.29%. Especially, the algorithm made results good enough and the built-in dictionary supplemented the algorithm. Consequently, the methodology is promising and the segmentation system would be helpful for the application system such as machine translation and information retrieval.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML