File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2052_intro.xml

Size: 2,614 bytes

Last Modified: 2025-10-06 14:03:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2052">
  <Title>Efficient sentence retrieval based on syntactic structure</Title>
  <Section position="3" start_page="0" end_page="399" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Retrieving similar sentences has attracted much attention in recent years, and several methods have been already proposed. They are useful for many applications such as information retrieval and machine translation. Most of the methods are based on frequencies of surface information such as words and parts of speech. These methods might work well concerning similarity of topics or contents of sentences. Although the surface information of two sentences is similar, their syntactic structures can be completely different (Figure 1).</Paragraph>
    <Paragraph position="1"> If a translation system regards these sentences as similar, the translation would fail. This is because conventional retrieval techniques exploit only similarity of surface information such as words and parts-of-speech, but not more abstract information such as syntactic structures.</Paragraph>
    <Paragraph position="2"> He beats a dog with a  fer in syntactic structure Collins et al. (Collins, 2001a; Collins, 2001b) proposed Tree Kernel, a method to calculate a similarity between syntactic structures. Tree Kernel defines the similarity between two syntactic structures as the number of shared subtrees. Retrieving similar sentences in a huge corpus requires calculating the similarity between a given query and each of sentences in the corpus. Building an index table in advance could improve retrieval efficiency, but indexing with Tree Kernel is impractical due to the size of its index table.</Paragraph>
    <Paragraph position="3"> In this paper, we propose two efficient algo- null rithms to calculate similarity of syntactic structures: Tree Overlapping and Subpath Set. These algorithms are more efficient than Tree Kernel because it is possible to make an index table in reasonable size. The experiments comparing these three algorithms showed that Tree Overlapping is 100 times faster and Subpath Set is 1,000 times faster than Tree Kernel when being used for structural retrieval.</Paragraph>
    <Paragraph position="4"> After briefly reviewing Tree Kernel in section 2, in what follows, we describe two algorithms in section 3 and 4. Section 5 describes experiments to compare these three algorithms and discussion on the results. Finally, we conclude the paper and look at the future direction of our research in section 6.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML