File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/i05-5004_evalu.xml
Size: 6,714 bytes
Last Modified: 2025-10-06 13:59:26
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-5004"> <Title>A Class-oriented Approach to Building a Paraphrase Corpus</Title> <Section position="7" start_page="29" end_page="31" type="evalu"> <SectionTitle> 6 Results and discussion </SectionTitle> <Paragraph position="0"> Table 1 gives some statistics of the resultant paraphrase corpora. Figures 3 and 4 show the number of candidate paraphrases, where the horizontal axes denote the total working hours of two annotators, and the vertical axes the number of candidateparaphrases. Thenumbers ofjudged, correct, incorrect, and deferred candidates are shown.</Paragraph> <Section position="1" start_page="29" end_page="29" type="sub_section"> <SectionTitle> 6.1 Efficiency </SectionTitle> <Paragraph position="0"> 2,031 candidate paraphrases have so far been judged in total and 1,075 paraphrase examples have been collected in 287.5 hours. The judgement was performed at a constant pace: 7.1 candidates (3.7 examples) in one hour. It is hard to compare these results with other work because no previous study quantitatively evaluate the efficiency in terms of manual annotation cost. However, we feel that the results have so far been satisfiable. null For each candidate paraphrase judged incorrect, the annotators were asked to classify the underlying errors into the fixed error types ((d) in tra time because it required linguistic expertise which the annotators were not familiar with.</Paragraph> <Paragraph position="1"> TransAlt was 1.75 times more time-consuming than LVC because the definition of TransAlt involved several delicate issues, which made the judgement process complicated. We return to this issue in Section 6.4.</Paragraph> </Section> <Section position="2" start_page="29" end_page="30" type="sub_section"> <SectionTitle> 6.2 Exhaustiveness </SectionTitle> <Paragraph position="0"> To estimate how exhaustively the proposed method collected paraphrase examples, we randomly sampled 750 sentences from the 4,500 sentences that were used in the trial for LVC, and manually checked whether the LVC paraphrasing could apply to each of them. As a result, 206 examples were obtained, 158 of which were those already collected by the proposed method. Thus, the estimated exhaustiveness was 77% (158/206). Our manual investigation into the missed examples has revealed that 47 misses could have been automatically generated by enhancing paraphrasing patterns and dictionaries, while only one example was missed due to an error in shallow parsing. 34 cases of the 48 misses could have been collected by adding a couple of paraphrasing patterns. For example, pattern (11) verbalizes anounfollowedby anominalizing suffix, &quot;ka (-ize),&quot; as in (12).</Paragraph> <Paragraph position="2"> this-TOP financial market-GEN activation-DAT muke-ta kisei-kanwa-saku-da.</Paragraph> <Paragraph position="3"> to address-PAST deregulation plan-COP This is a deregulation plan aiming at the activation of financial market.</Paragraph> <Paragraph position="4"> t. kore-wa kin'yu-shijo-o this-TOP financial market-ACC kassei-ka-suru kisei-kanwa-saku-da. to activate-PRES deregulation plan-COP This is a deregulation plan which activates financial market.</Paragraph> <Paragraph position="5"> We cannot know if we have adequate paraphrasing patterns and resources before trials. Therefore, manual examination isnecessary torefinethem tobridgegap between therangeof paraphrases that can be automatically generated and those of the specific class we consider.</Paragraph> </Section> <Section position="3" start_page="30" end_page="31" type="sub_section"> <SectionTitle> 6.3 Reliability </SectionTitle> <Paragraph position="0"> Ideally, more annotators should be employed to ensure the reliability of the products, which, however, leads to a matter of balancing the trade-off.</Paragraph> <Paragraph position="1"> Instead, we specified the detailed judgement criteria for each paraphrase class, and asked the annotators to reconsider marginal cases several days later and to make a discussion when judgements disagreed. The agreement ratio for correct candidates between two annotators increased as they became used to the task. In the trial for LVC, for example, the agreement ratio for each day changed from 74% (day 3) to 77% (day 6), 88% (day 9), and 93% (day 11). This indicates that the judgement criteria were effectively refined based on the feedback from inter-annotator discussions on marginal and disagreed cases. To evaluate the reliability of our judgement procedure more precisely, we are planing to employ the third annotator who will be asked to judge all the cases independently of the others.</Paragraph> <Paragraph position="2"> 6.4 How we define paraphrase classes One of the motivations behind our class-basedapproach is an expectation that specifying the target classes of paraphrases would simplify the awkward problem of defining the boundary between paraphrasesan non-paraphrases. Our trialsfor the two paraphrase classes, however, have revealed that it can still be difficult to create a clear criterion for judgement even when the paraphrase class in focus is specified.</Paragraph> <Paragraph position="3"> As one of the criteria for TransAlt, we tested the agentivity of the nominative case of intransitive verbs. The test used an adverb, &quot;muzukara (by itself),&quot; and classified a candidate paraphrase as incorrect if the adverb could be inserted immediately before the intransitive verb. For example, we considered example (13) as a correct paraphrase of the TransAlt class whereas (14) incorrect because the agentivity exhibited by (14s) did not remain in (14t).</Paragraph> <Paragraph position="4"> (13) s. kare-ga soup-o atatame-ta.</Paragraph> <Paragraph position="5"> he-NOM soup-ACC to warm up-PAST He warmed the soup up.</Paragraph> <Paragraph position="6"> t. soup-ga atatamat-ta. (correct) soup-NOM to be warmed up-PAST The soup was warmed up (by somebody).</Paragraph> <Paragraph position="7"> (14) s. kare-ga koori-o tokashi-ta.</Paragraph> <Paragraph position="8"> he-NOM ice-ACC to melt (vt)-PAST He melted the ice.</Paragraph> <Paragraph position="9"> t. koori-ga toke-ta. (incorrect) ice-NOM to melt (vi)-PAST The ice melted (by itself).</Paragraph> <Paragraph position="10"> However, one might regard both paraphrases incorrect because the information given by the nominative argument of the source sentence is dropped in the target in both cases. Thus, the problem still remains. Nevertheless, our approach will provide us with a considerable amounts of concrete data, which we hope will lead us to better understanding of the issue.</Paragraph> </Section> </Section> class="xml-element"></Paper>