File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2189_intro.xml
Size: 5,248 bytes
Last Modified: 2025-10-06 14:06:03
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2189"> <Title>Using sentence connectors for evaluating MT output</Title> <Section position="3" start_page="1066" end_page="1066" type="intro"> <SectionTitle> 2 Outline of the evaluation method </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1066" end_page="1066" type="sub_section"> <SectionTitle> 2.1 Compare salient properties </SectionTitle> <Paragraph position="0"> To test whether the meaning of a translated text has come across, one could simply ask the evaluators questions about the translated text, or have them summarise it. Such methods however are either costly (for each new text a new set of questions will have to be devised) or hard to quantify objectively, or even both.</Paragraph> <Paragraph position="1"> The method we will adopt involves constructing a profile of both ttre original and the translated text in terms of some salient semantic or pragmatic property of its constituent sentences. These profiles can then be compared to give an indication of translation quality: if we assume that the original text's profile is &quot;perfect&quot;, then the degree to which the profile of the translated text resembles tile perfect profile will correspond (in theory at least) to the quality of the translation. This approach assumes that the number and order of sentences are invariant in translation; luckily, for MT systems, this is almost always true.</Paragraph> <Paragraph position="2"> As for the salient property to be used in the profile, we settled on meaning relations of single sentences with previous text: this property seemed to us to be both fairly discriminating and implementable. In summary, a profile will be an ordered list of meaning relations xi, i = 2 ... n which describe the relation of sentence i with what came before. Moreover, the target of each relation is taken to be the previous sentence, i.e. sentence i-1 (see SS 4 for further discussion).</Paragraph> </Section> <Section position="2" start_page="1066" end_page="1066" type="sub_section"> <SectionTitle> 2.2 Avoid (-ontrived definitions </SectionTitle> <Paragraph position="0"> A set of sentence-to-sentence relation categories will then have to be designed and defined; but the wide w~riety of proposed methods and solutions (see (Hovy and maier, 1993) for an overview) suggests that this is not an easy task. Indeed, the problem with categories and definitions is that the evaluator will always have to depend to a certain extent on his own personal understanding of these definitions; and the more categories there are, the greater the chance that their definitions will not always be clear and fixed in his mind. This naturally has a deleterious effect on the reliability and universality of evaluation results.</Paragraph> <Paragraph position="1"> We will get back to the design problem later, but with respect to the definition problem, our solution was to simply hide the definitions. We have sought to accomplish this by instructing the evaluator to link sentences linguistically; more specifically, wc have opted to instruct the evaluator to choose a conjunct 2 to be inserted between every pair of consecutive sentences. The conjuncts 2A subclass of the adverbs, el. (Quirk et al., 1985) pg. 631-. For languages that do not recognise this class, surrogates can be concocted: for Japanese, a mixture of conjunctions and conjoining adverbs.</Paragraph> <Paragraph position="2"> themselves may be divided into categories, but these can remain hidden from the evaluator. This approach hinges on the hope that straight linguistic knowledge comes more naturally to people and is less susceptible to person-to-person differences than contrived meaning categories.</Paragraph> </Section> <Section position="3" start_page="1066" end_page="1066" type="sub_section"> <SectionTitle> 2.3 Standardise thinking methods </SectionTitle> <Paragraph position="0"> Small-scale preliminary experiments (on paper) showed that in spite of the above refinements, evaluator differences were still larger than seemed reasonable. We surmised that this was due to differences in work methods (or thinking methods), and that therefore these needed to be equalised a little more. We decided on two countermeasures.</Paragraph> <Paragraph position="1"> Recoguising that the class of eonjuncts was to() large for the evaluator to encompass at a glance, we decided to implement an interactive Q&A interface on the computer in order to gradually guide the evahlator to the optimal choice of a conjanet. Obviously this opens a whole new can of worms, in that the interface has to be designed (the kind and order of questions etc.); we will get back to that later (in SS 4).</Paragraph> <Paragraph position="2"> The other step was to instruct the evaluator to extract the topic and comment of the sentence under consideration. Both topic and comnlent were only loosely defined: in truth the topic and comment are not important as such, rather their extraction was intended as a means to force the evaluator to get a clearer picture of the meaning of the sentence under consideration (though we did not tell them this).</Paragraph> </Section> </Section> class="xml-element"></Paper>