File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-5003_intro.xml
Size: 3,414 bytes
Last Modified: 2025-10-06 14:03:03
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-5003"> <Title>Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence</Title> <Section position="2" start_page="0" end_page="17" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Automatic machine translation evaluation is a meansofscoringtheoutputfromamachinetranslation system with respect to a small corpus of reference translations. The basic principle being that an output is a good translation if it is 'close' in some way to a member of a set of perfect translations for the input sentence. The closeness that thesetechniquesaretryingtocaptureisinessence the notion of semantic equivalence. Two sentences being semantically equivalent if they convey the same meaning.</Paragraph> <Paragraph position="1"> MT evaluation techniques have found application in the field of entailment recognition, a close relative of semantic equivalence determination that seeks methods for deciding whether the information provided by one sentence is included in an another. (Perez and Alfonseca, 2005) directly applied the BLEU score to this task and (Kouylekov and Magnini, 2005) applied both a word and tree edit distance algorithm. In this paper we evaluate these techniques or variants of them and other MT evaluation techniques on both entailment and semantic equivalence determination, to allow direct comparison to our results. When using a single reference sentence for each candidate the task of deciding whether a pair of sentences are paraphrases and the task of MT evaluation are very similar. Differences arise from the nature of the sentences being compared, that is MT output might not consist of grammatically correct sentences. Moreover, MT evaluation scoring need not necessarily be computed on a sentence-by-sentence basis, but can be based on statistics derived at the corpus level. Finally, the process of MT evaluation is asymmetrical.</Paragraph> <Paragraph position="2"> That is, there is a distinction between the references and the candidate machine translations.</Paragraph> <Paragraph position="3"> Fortunately, the automatic MT evaluation techniques commonly in use do not make any explicit attempt to score grammaticality, and (except BLEU) decompose naturally into their component scores at the sentence level. (Blatz et al., 2004) used a variant of the WER score and the NIST score at the sentence level to assign correct- null ness to translation candidates, by scoring them with respect to a reference set. These correctness labels were used as the 'ground truth' for classifiers for the correctness of translation candidates for candidate sentence confidence estimation. We too adopt sentence level versions of these scores and use them to classify paraphrase candidates.</Paragraph> <Paragraph position="4"> The motivation for these experiments is twofold: firstly to determine how useful the features used by these MT evaluation techniques to semantic equivalence classifiers. One would expect that systems that perform well in one domain should also perform well in the other. After all, determining sentence level semantic equivalence is &quot;part of the job&quot; of an MT evaluator. Our second motivation is the conjecture that successful techniques and strategies will be transferable between the two tasks.</Paragraph> </Section> class="xml-element"></Paper>