File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-5004_intro.xml
Size: 5,509 bytes
Last Modified: 2025-10-06 14:03:01
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-5004"> <Title>A Class-oriented Approach to Building a Paraphrase Corpus</Title> <Section position="3" start_page="25" end_page="26" type="intro"> <SectionTitle> 2 Goal </SectionTitle> <Paragraph position="0"> Paraphrases exhibit a wide variety of patterns ranging from lexical paraphrases to syntactic transformations and their combinations. Some of them are highly inferential or idiomatic and do not seem easy to generate only with syntactic and semantic knowledge. Such groups of paraphrasesrequireustopursuecorpus-basedacquisi- null tion methods such asthosedescribedin Section 3.</Paragraph> <Paragraph position="1"> More importantly, however, we can also find quite a few patterns of paraphrases that exhibit a degree of regularity. Those groups of paraphrases have a potential to be compositionally explained by combining syntactic and semantic properties of their constituent words. For instance, the followingparaphrases2 inJapaneseareconsideredto be of these groups.</Paragraph> <Paragraph position="2"> (1) s. eiga-ni shigeki-o uke-ta.</Paragraph> <Paragraph position="3"> film-DAT inspiration-ACC to receive-PAST I received an inspiration from the film.</Paragraph> <Paragraph position="4"> t. eiga-ni shigeki-s-are-ta.</Paragraph> <Paragraph position="5"> film-DAT to inspire-PASS-PAST I was inspired by the film.</Paragraph> <Paragraph position="6"> (2) s. sentakumono-ga soyokaze-ni yureru.</Paragraph> <Paragraph position="7"> laundry-NOM breeze-DAT to sway-PRES The laundry sways in the breeze.</Paragraph> <Paragraph position="8"> t. soyokaze-ga sentakumono-o yurasu.</Paragraph> <Paragraph position="9"> breeze-NOM laundry-ACC to sway-PRES The breeze makes the laundry sways.</Paragraph> <Paragraph position="10"> (3) s. glass-ni mizu-o mitashi-ta.</Paragraph> <Paragraph position="11"> glass-DAT water-ACC to fill-PAST I filled water into the glass.</Paragraph> <Paragraph position="12"> t. glass-o mizu-de mitashi-ta.</Paragraph> <Paragraph position="13"> glass-ACC water-IMP to fill-PAST I filled the glass with water.</Paragraph> <Paragraph position="14"> (4) s. kare-wa kikai-sousa-ga jouzu-da. he-TOP machine operation-NOM be good-PRES He is good at machine operation.</Paragraph> <Paragraph position="15"> t. kare-wa kikai-o jouzu-ni sousa-suru. he-TOP machine-ACC well-ADV to operate-PRES He operates machines well.</Paragraph> <Paragraph position="16"> (5) s. heya-wa mou atatamat-teiru.</Paragraph> <Paragraph position="17"> room-TOP already to be warmed-PERF The room has already been warmed up.</Paragraph> <Paragraph position="18"> t. heya-wa mou atatakai.</Paragraph> <Paragraph position="19"> room-TOP already be warm-PRES The room is warm.</Paragraph> <Paragraph position="20"> 2For each example, &quot;s&quot; and &quot;t&quot; denote an original sentence and its paraphrase, respectively. In example (1), a verb phrase, &quot;shigeki-o uketa (to receive an inspiration),&quot; is paraphrased into a verbalized form of the noun, &quot;shigeki-s-are-ta (to be inspired).&quot; We can find a number of paraphrases that exhibit a similar pattern of syntactic transformation in the same language and group such paraphrases into a single class, which is possibly labeled &quot;paraphrasing of light-verb construction.&quot; Likewise, paraphrases exemplified by (2) constitute another class, so-called transitivity alternation. Example (3) is of the locative alternation class and example (4) the compound noun decomposition class. In example (5), a verb &quot;atatamaru (to be warmed)&quot; is paraphrased into its adjective form, &quot;atatakai (be warm).&quot; Paraphrases involving such a lexical derivation are also in our concern.</Paragraph> <Paragraph position="21"> One can learn the existence of such groups of paraphrases and the regularity each group exhibits from the linguistic literature (Mel'Vcuk and Polgu`ere, 1987; Jackendoff, 1990; Kageyama, 2001). According to Jackendoff and Kageyama, for instance, both transitivityalternation and locative alternation can be explained in terms of the syntactic and semantic properties of the verb involved, which are represented by what they call Lexical Conceptual Structure. The systematicity underlying such linguistic accounts is intriguing also from the engineering point of view asit could enable us to take a more theoretically motivated but still practical approach to paraphrase generation. null Aiming at this goal leads us to consider building a paraphrase corpus which enables us to evaluate paraphrase generation systems and conduct error analysis for each paraphrase class separately. Our paraphrase corpus should therefore be organized according to paraphrase classes. More specifically, weconsider aparaphrasecorpussuch that: The corpus consists of a set of subcorpora.</Paragraph> <Paragraph position="22"> Each subcorpus is a collection of paraphrase sentence pairs of a paraphrase class.</Paragraph> <Paragraph position="23"> Paraphrases collected in a subcorpus sufficiently reflect the distribution of the occurrences in the real world.</Paragraph> <Paragraph position="24"> Given a paraphrase class and a text collection, the goal of building a paraphrase corpus is to collect paraphrase examples belonging to the class as exhaustively as possible from the text collection at a minimal human labor cost. The resultant corpus should also be reliable.</Paragraph> </Section> class="xml-element"></Paper>