File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-5007_intro.xml

Size: 5,229 bytes

Last Modified: 2025-10-06 14:03:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-5007">
  <Title>Automated Generalization of Phrasal Paraphrases from the Web*</Title>
  <Section position="3" start_page="0" end_page="49" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Paraphrases are alternative ways to convey the same information (Barzilay and McKeown, 2001) and they have been applied in many fields of natural language processing. There are many previous work on paraphrase examples extraction or combining them with some applications such as information retrieval and question answering (Agichtein et al., 2001; Florence et al., 2003; Rinaldi et al., 2003; Tomuro, 2003; Lin and Pantel, 2001;), information extraction (Shinyama et al., 2002; Shinyama and Sekine, 2003), machine translation (Hiroshi et al., 2003; Zhang and Yamamoto, 2003), multi-document (Barzilay et al., 2003).</Paragraph>
    <Paragraph position="1"> There is also some other research about paraphrase. (Wu and Zhou, 2003) just extract the synonymy collocation, such as &lt;turn on, OBJ, light&gt; and &lt;switch on, OBJ, light&gt; using both monolingual corpora and bilingual corpora to get an optimal result, but do not generalize them. (Glickman and Dagan, 2003) detects verb paraphrases instances within a single corpus without relying on any priori structure and information. Generation of paraphrase examples was also investigated (Barzilay and Lee, 2003; Quirk et al., 2004).</Paragraph>
    <Paragraph position="2"> Rather than creating and storing thousands of paraphrases, paraphrase templates have strong representation capacity and can be used to generate many paraphrase examples. As (Hirst, 2003) said, for each aspect of paraphrase there are two main challenges: representation of knowledge and acquisition of knowledge. Corresponding to the problem of generalization of paraphrase templates, there are also two problems: the first is the representation of paraphrase templates and the second is acquisition of paraphrase templates.</Paragraph>
    <Paragraph position="3"> There are several methods about paraphrase templates representation. The first method is using the Part-of-Speech (Barzilay and McKeown, 2001; Daume and Marcu, 2003; Zhang and Yamamoto, 2003), the second uses name entity as the variable (Shinyama et al., 2002; Shinyama and Sekine, 2003), the third method is similar to the second method which is called the inference rules extraction (Lin and Pantel, 2001).</Paragraph>
    <Paragraph position="4"> A paraphrases template is a pair of natural language phrases with variables standing in for certain grammatical constructs in (Daume and *: Supported by the Key Project of National Natural Science Foundation of China under Grant No. 60435020  Marcu, 2003). He used Part-of-Speech to represent templates. But for some cases, the POS will be very limited and for some other cases will be over generalized. For example: ,9?k (In my view/mind ----I feel) The above pair of phrases is a paraphrase, it can be generalized using POS information:</Paragraph>
    <Paragraph position="6"> But for this template many noun words will be excluded. From this point of view, the template representation capacity is limited. But for other examples, the POS information will be over generally. For example:  Here, we just generalize one variable &amp;quot;8p &amp;quot;. Then, the template becomes: [noun] ,X (What's the price for the [noun]?) [noun] Jx (How much is the [noun] per Jin?) If there is a sentence &amp;quot;0A,X (What's the price for the notebook?)&amp;quot;, its' paraphrase will be &amp;quot;0AJx (How much is the notebook per Jin?)&amp;quot; according to this template. Obviously, the result is unreasonable. (Shinyama et al., 2002) tried to find paraphrases assuming that two sentences sharing many Named Entities and a similar structure are likely to be paraphrases of each other. But just name entities are limited, too. And (Lin and Pantel, 2001) present an unsupervised algorithm for discovering inference rules from text such as &amp;quot;X writes Y&amp;quot; and &amp;quot;X is the author of Y&amp;quot;. This generalized method has good ability. But it also has some limited aspect. For example: [Jack] writes [his homework].</Paragraph>
    <Paragraph position="7"> According to the paraphrase template, the target sentence will be transformed into &amp;quot;[Jack] is the author of [his homework]&amp;quot;. It's obviously that the generated sentence is not standard.</Paragraph>
    <Paragraph position="8"> So how to represent paraphrase templates and generalize the paraphrase examples is a very interesting task. In this paper, we present a novel approach to represent paraphrase template with semantic code of words and using an existing search engine to get the paraphrase template.</Paragraph>
    <Paragraph position="9"> The remainder of this paper is organized as follows. In the next section, we give the overview of our method. In section 3, we define the representation method in details. Section 4 presents the generalization method. Some experiments and discussions are shown in Section 5. Finally, we draw a conclusion of this method and give some suggestions about future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML