File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-2001_intro.xml

Size: 1,406 bytes

Last Modified: 2025-10-06 14:02:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-2001">
  <Title>A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Differential Latent Semantic Indexing
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Method
</SectionTitle>
      <Paragraph position="0"> A term is defined as a word or a phrase that appears at least in two documents. We exclude the so-called stop words such as &amp;quot;a&amp;quot;, &amp;quot;the&amp;quot; in English which are used most frequently in any topics, but remain irrelevant to our purpose of document search.</Paragraph>
      <Paragraph position="1"> Suppose we select and list the terms that appear in the documents as D8  . For each patent document in collection, we preprocess it and assign it with a document vector as B4CP  denotes the global weight over all the documents; the weight denotes a parameter indicating the relative importance of the term in representing the document abstracts. Local weights could be either raw occurrence counts, boolean, or logarithms of occurrence count. Global weights could be no weighting (uniform), domain specific, or entropy weighting. The document vector is normalized as B4CQ</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML