File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1605_intro.xml
Size: 5,714 bytes
Last Modified: 2025-10-06 14:01:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1605"> <Title>Interrogative Reformulation Patterns and Acquisition of Question Paraphrases</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The phenomenon of paraphrase in human languages is essentially the inverse of ambiguity - a given sentence could ambiguously have several meanings, while any given meaning could be formulated into several paraphrases using various words and syntactic constructions. For this reason, paraphrase poses a great challenge for many Natural Language Processing (NLP) tasks, just as ambiguity does, notably in text summarization and NL generation (Barzilay and Lee, 2003; Pang et al., 2003).</Paragraph> <Paragraph position="1"> The problem of paraphrase is important in Question-Answering systems as well, because the systems must return the same answer to questions which ask for the same thing but are expressed in different ways. Recently there have been several work which utilized reformulations of questions as a way to fill the chasm between words in a question and those in a potential answer sentence (Hermjakob et al., 2002; Murata and Isahara, 2001; Agichtei et al., 2001). In general, paraphrasing a question, be it for recognition or generation, is more difficult than a declarative sentence, because interrogative words carry a meaning of their own, which is subject to reformulation, in addition to the rest (or the sentence part) of the question. Reformulations of the interrogative part of questions have some interesting characteristics which are distinct from reformulations of the sentence part or declarative sentences. First, paraphrases of interrogatives are strongly lexical and idiosyncratic, containing many keywords, idioms or fixed expressions. For example, for a question &quot;How can I clean teapots?&quot; one can easily think of some variations of the 'how' part while fixing the sentence part: - &quot;In what way should I clean teapots?&quot; - &quot;What do I have to do to clean teapots?&quot; - &quot;What is the best way to clean teapots?&quot; - &quot;What method is used for cleaning teapots?&quot; - &quot;How do I go about cleaning teapots?&quot; - &quot;What is involved in cleaning teapots? - &quot;What should I do if I want to clean teapots? Second, reformulation patterns of interrogatives seem to be governed by question types. For example, the variation patterns above apply to almost all 'how-to' questions, while 'why' questions undergo a different set of transformations (e.g. &quot;Why ..&quot;, &quot;For what reason ..&quot;, &quot;What was the reason why ..&quot; etc.). Also, further observations suggest that questions of the same question type have the same semantic empty category: something (or some things) which a question is asking.</Paragraph> <Paragraph position="2"> In this paper, we describe the set of paraphrase/reformulation patterns we derived from a corpus of questions, and report the result of using them in the automatic recognition of question paraphrases. We also describe the process in which we acquired paraphrases, which we used as the test data.</Paragraph> <Paragraph position="3"> Our approaches to constructing those resources were manual - the transformation patterns were derived by inspecting an existing large corpus of questions, and the paraphrases were collected by asking web users to type in reformulations of sample questions.</Paragraph> <Paragraph position="4"> Our work here is focused on the reformulations of the interrogative part of questions in contrast to other work in question-answering where major emphases are placed on the reformulations of phrases or words in the sentence part (Lin and Pantel, 2001; Hermjakob et al., 2002). The patterns we derived are essentially rules which map surface syntactic structures to semantic case frame representations. We use those case frame representations when we compare questions for similarity. The results obtained by the use of the patterns in paraphrase recognition were quite promising.</Paragraph> <Paragraph position="5"> The motivation behind the work we present here is to improve the retrieval accuracy of our system called FAQFinder (Burke et al., 1997). FAQFinder is a web-based, natural language question-answering system which uses Usenet Frequently Asked Questions (FAQ) files to answer users' questions. Each FAQ file contains a list of question-and-answer (Q&A) pairs on a particular subject. Given a user's question as a query, FAQFinder tries to find an answer by matching the user's question against the question part of each Q&A pair, and displays 5 FAQ questions which are ranked the highest by the system's similarity measure. Thus, FAQFinder's task is to identify FAQ questions which are the best paraphrases of the user's question. Figure 1 shows a screen snapshot of FAQFinder where a user's query turned by FAQFinder &quot;What do I have to do to clean teapots?&quot; is matched against the Q&A pairs in 'drink tea faq'. The current similarity measure used in the system is a combination of four independent metrics: term vector similarity, coverage, semantic similarity, and question type similarity (Lytinen and Tomuro, 2002). Although those metrics are additive and complemental to each other, they cannot capture the relations and interactions between them. The idea of paraphrase patterns proposed in this paper is a first step in developing an alternative, integrated similarity measure for question sentences.</Paragraph> </Section> class="xml-element"></Paper>