File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1106_intro.xml

Size: 2,618 bytes

Last Modified: 2025-10-06 14:01:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1106">
  <Title>Corpus-Based Learning of Compound Noun Indexing *</Title>
  <Section position="3" start_page="0" end_page="57" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Compound nouns are more specific and expressive than simple nouns, so they are more valuable as index terms and can increase the precision in search experiments. There are many definitions for the compound noun which cause ambiguities as to whether a given continuous noun sequence is a compound noun or not. We, therefore, need a clean &amp;quot; This research was supported by KOSEF special purpose basic research (1997.9 - 2000.8 #970-1020301-3) null t Corresponding author definition of compound nouns in terms of information retrieval, so we define a compound noun as &amp;quot;any continuous noun sequence that appears frequently in documents.&amp;quot; 1 In Korean documents, compound nouns are represented in various forms (shown in Table 1), so there is a difficulty in indexing all types of compound nouns. Until now, there have been much works on compound noun indexing, but they still have limitations of covering all types of compound nouns and require much linguistic knowledge to accomplish this goal. In this paper, we propose a corpus-based learning method for compound noun indexing which can extract the rules automatically with little linguistic knowledge.  noun with regard to &amp;quot;jeong-bo geom-saeg (information retrieval)&amp;quot; jeong-bo-geom-saeg (information-retrieval) jeong-bo-eui geom-saeg (retrieval of information) jeong-bo geom-saeg (information retrieval) jeong-bo-leul geom-saeg-ha-neun (retrieving information) jeong-bo-geom-saeg si-seu-tem (information-retrieval system) As the number of the documents is growing retrieval, efficiency also becomes as important as effectiveness. To increase the efficiency, we focus on reducing the number of indexed spurious compound nouns. We perform experiments on several filtering methods to find the algorithm that can reduce spurious compound nouns most efficiently.</Paragraph>
    <Paragraph position="1">  The remainder of this paper * is organized as follows. Section 2 describes previous compound noun indexing methods for Korean and compound noun filtering methods. We show overall compound noun indexing system architecture in Section 3, and expl~.~n each module of the system in Section 4 and 5 in detail. We evaluate our method with standard Korean test collections in Section 6. Finally, concluding remarks are given in Section 7.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML