File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-2017_intro.xml

Size: 2,714 bytes

Last Modified: 2025-10-06 14:03:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2017">
  <Title>Computing Term Translation Probabilities with Generalized Latent Semantic Analysis</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Many recent applications such as document summarization, passage retrieval and question answering require a detailed analysis of semantic relations between terms since often there is no large context that could disambiguate words's meaning.</Paragraph>
    <Paragraph position="1"> Many approaches model the semantic similarity between documents using the relations between semantic classes of words, such as representing dimensions of the document vectors with distributional term clusters (Bekkerman et al., 2003) and expanding the document and query vectors with synonyms and related terms as discussed in (Levow et al., 2005). They improve the performance on average, but also introduce some instability and thus increased variance (Levow et al., 2005).</Paragraph>
    <Paragraph position="2"> The language modelling approach (Ponte and Croft, 1998; Berger and Lafferty, 1999) proved very effective for the information retrieval task.</Paragraph>
    <Paragraph position="3"> Berger et. al (Berger and Lafferty, 1999) used translation probabilities between terms to account for synonymy and polysemy. However, their model of such probabilities was computationally demanding.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Latent Semantic Analysis (LSA)(Deerwester et
</SectionTitle>
      <Paragraph position="0"> al., 1990) is one of the best known dimensionality reduction algorithms. Using a bag-of-words document vectors (Salton and McGill, 1983), it computes a dual representation for terms and documents in a lower dimensional space. The resulting document vectors reside in the space of latent semantic concepts which can be expressed using different words. Thestatistical analysis ofthesemantic relatedness between terms is performed implicitly, in the course of a matrix decomposition.</Paragraph>
      <Paragraph position="1"> In this project, we propose to use a combination of dimensionality reduction and language modelling to compute the similarity between documents. We compute term vectors using the Generalized Latent Semantic Analysis (Matveeva et al., 2005). This method uses co-occurrence based measures of semantic similarity between terms to compute low dimensional term vectors in the space of latent semantic concepts. The normalized cosine similarity between the term vectors is used as term translation probability.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML