File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1608_intro.xml
Size: 2,353 bytes
Last Modified: 2025-10-06 14:02:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1608"> <Title>Extracting Structural Paraphrases from Aligned Monolingual Corpora</Title> <Section position="3" start_page="0" end_page="1" type="intro"> <SectionTitle> 2 Previous Work </SectionTitle> <Paragraph position="0"> There has been a rich body of research on automatically deriving paraphrases, including equating morphological and syntactic variants of technical terms (Jacquemin et al., 1997), and identifying equivalent adjective-noun phrases (Lapata, 2001). Unfortunately, both are limited in types of paraphrases that they can extract. Other researchers have explored distributional clustering of similar words (Pereira et al., 1993; Lin, 1998), but it is unclear to what extent such techniques produce paraphrases.</Paragraph> <Paragraph position="1"> Most relevant to this paper is the work of Barzilay and McKeown and the work of Lin and Pantel. Barzilay and McKeown (2001) extracted both single- and multiple-word paraphrases from a sentence-aligned corpus for use in multi-document summarization. They constructed an aligned corpus from multiple translations of foreign novels. From this, they co-trained a classifier that decided whether or not two phrases were paraphrases of each other based on their surrounding context. Barzilay and McKeown collected 9483 paraphrases with an average precision of 85.5%. However, 70.8% of the paraphrases were single words. In addition, the paraphrases were required to be contiguous.</Paragraph> <Paragraph position="2"> Lin and Pantel (2001) used a general text corpus to extract what they called inference rules, which we can take to be paraphrases. In their algorithm, rules are represented as dependency tree paths between two words. The words at the ends of a path are considered to be features of that path. For each path, they recorded the different features (words) that were associated with the path and their respective frequencies. Lin and Pantel calculated the similarity of two paths by looking at the similarity of their features. This method allowed them to extract inference rules of moderate length from general corpora. However, the technique is computationally expensive, and furthermore can give misleading results, i.e., paths having the opposite meaning often share similar features.</Paragraph> </Section> class="xml-element"></Paper>