File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-2007_intro.xml

Size: 2,039 bytes

Last Modified: 2025-10-06 14:01:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2007">
  <Title>A Novel Approach to Semantic Indexing Based on Concept</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Related Works
</SectionTitle>
    <Paragraph position="0"> Since index terms are not equally important regarding the content of the text, they have term weights as an indicator of importance. Many weighting functions have been proposed and tested. However, most weight functions depend on the statistical methods or on the document's term distribution tendency. Representative weighting functions include such factors as term frequency, inverse document frequency, the product of the term and inverse document frequency, and length normalization(Moens, 2000).</Paragraph>
    <Paragraph position="1"> Term frequency is useful in a long document, but not in a short document. In addition, term frequency cannot represent the exact term frequency because it does not include anaphoras, synonyms, and so on.</Paragraph>
    <Paragraph position="2"> Inverse document frequency is inappropriate for a reference collection that changes frequently because the weight of an index term needs be recomputed.</Paragraph>
    <Paragraph position="3"> A length normalization method is proposed because term frequency factors are numerous for long documents, and negligible for short ones, obscuring the real importance of terms. As this approach also uses term frequency function, it has the same disadvantage as term frequency does.</Paragraph>
    <Paragraph position="4"> Hence, we made an effort to use methods based on the linguistic phenomena to enhance the indexing performance. Our approach focuses on proposing concept vector space for extracting and weighting indexes, and we intend to compensate limitations of the term frequency based methods by employing lexical chains. Lexical chains are to link related lexical items in a document, and to represent the lexical cohesion structure of a document(Morris, 1991).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML