File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-2189_metho.xml

Size: 8,656 bytes

Last Modified: 2025-10-06 14:14:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-2189">
  <Title>Using sentence connectors for evaluating MT output</Title>
  <Section position="4" start_page="1066" end_page="1067" type="metho">
    <SectionTitle>
3 Basic assumptions
</SectionTitle>
    <Paragraph position="0"> At this point, it is useful to look back at the design considerations outlined above and to clarify exactly what assumptions on sentences and relations underlie them. With a little luck, our results can provide some support for these assumptions.</Paragraph>
    <Paragraph position="1"> The first of our assumptions is that it is always possible to make explicit the relationship of a sentence to what has come before using a conjunct. The conjunct may be present in the sentence, but even if it is not, it can be added in a linguistically satisfactory way. We also assume that the assignment of acceptable conjuncts is reader-independent to a large degree.</Paragraph>
    <Paragraph position="2"> We assume that conjuncts (which form a closed class) can be divided into a limited number of categories that are meaningful in terms of expressing the semantic relationship between sentences.</Paragraph>
    <Paragraph position="3"> Yet another assumption is that the meaning relationships between sentences of a text combine to form a characteristic feature (3 profile) of that text, and that this profile needs to be preserved in translation. Moreover, the ease with which this profile can be discerned in the translated text is assumed to be related to the readability or understandability of the text as a whole.</Paragraph>
  </Section>
  <Section position="5" start_page="1067" end_page="1067" type="metho">
    <SectionTitle>
4 The implementation
</SectionTitle>
    <Paragraph position="0"> A prototype was implemented on a Macintosh computer using HyperCard. The evaluation process is made up of the following steps, which have to be executed for every sentence in the text.</Paragraph>
    <Paragraph position="1">  1. Extract the topic(s) and comment(s) of the sentence under consideration.</Paragraph>
    <Paragraph position="2"> 2. If there is more than one topic/comment pair, order the pairs as seems best and determine (using the same method as for sentences) which conjuncts fit best between the pairs.</Paragraph>
    <Paragraph position="3"> 3. Determine through a dialog with the system  which conjnnct fits best at the start of the sentence under consideration.</Paragraph>
    <Paragraph position="4"> A backtrack function was implemented which allowed the subjects to come back on decisions made earlier in the dialog. The prototype keeps a very detailed log of what the evaluator does exactly. Without going into technical details, the following were the main tasks in the implementation. Categorising the conjuncts Our first categorisation of conjuncts was based on information concerning conjuncts and rhetorical structures that we patched together from authoritative grammars for English (Quirk et al., 1985), Japanese (Martin, 1975) e.a. We came up with 9 categories; in a later redesign we took the conjuncts themselves as our starting point and, by tracing crossreferenees in dictionaries, were able to reduce the initial number of +- 220 to 32 &amp;quot;basic&amp;quot; conjuncts, divided over 11 categories. Assisting topic/comment extraction Frankly we have been unable to find a foolproof method, and have settled for user-requested online help cued on linguistic aspects of the sentence. Defining the scope of meaning relations We have established above that meaning relations hold between consecutive sentences; this is however not self-evident. A sentence may relate to a more remote sentence @5, for instance), or to a block of sentences; see (Kurohashi and Nagao, 1995) for a more plausible model. We found however that an online computer interface that would allow the user to specify the target of a relation to this extent would become prohibitively complicated. The evaluator's task would involve so much juggling with relations and attaining such a deep understanding of the text that it would in the end have a negative effect on the reproducability and evaluator-independence of the results.</Paragraph>
    <Paragraph position="5"> Designing the dialog We believe that this is a trial-and-error process which will have to bc guided by the outcome of experiments; more about this will follow below.</Paragraph>
  </Section>
  <Section position="6" start_page="1067" end_page="1068" type="metho">
    <SectionTitle>
5 The experiments
</SectionTitle>
    <Paragraph position="0"> We decided that experiments needed to establish three qualities of this system.</Paragraph>
    <Paragraph position="1"> Evaluator-independence Given a text in one language, different evaluators should produce the same connectivity profile.</Paragraph>
    <Paragraph position="2"> Language-independence Given a &amp;quot;perfectly&amp;quot; translated text, its connectivity profile should turn out the same as that of the original.</Paragraph>
    <Paragraph position="3"> Quantifiability Given translations of varying quality, the degree of correspondence in the connectivity profiles must be shown to correspond to the quality of the translation.</Paragraph>
    <Paragraph position="4"> But first we conducted a preliminary experiment.</Paragraph>
    <Section position="1" start_page="1067" end_page="1068" type="sub_section">
      <SectionTitle>
5.1 Experiments with the dialog
</SectionTitle>
      <Paragraph position="0"> Our first experiments (Japanese only) concerned the conjunct-determining dialogs. We implemented 3 interfaces, each comprising the same 61 conjuncts spread over 9 categories: one (A) based on categories (the subjects got a list of categories in the first screen, and if they clicked one they got the conjuncts in that category on the second screen); one (B) based on the conjuncts themselves (the subjects just got the whole list of conjuncts, spread over a couple of screens, without elaboration); and one (C) with questions (3 answers to choose from on the first screen, one of these leads to a second question with 4 answers, all other links lead to sets of conjuncts).</Paragraph>
      <Paragraph position="1"> Subjects were assigned an interface, given a 9sentence text and asked to connect the sentences, without however performing topic/comment extraction. A fourth group was asked to use interface C, but also to extract topic and comment before connecting the sentences (D). The results are given in table 1. The mean of the evaluators' choices was computed by transforming the results into numbers (if 7 out of 10 evaluators chose category X, 2 chose Y, and 1 Z, then this would result in the values {1 1 1 1 1 1 1 2 2 3}), and inputting these numbers into the following formula.</Paragraph>
      <Paragraph position="3"> We might add that subjects using interfaces A and B were more likely to choose &amp;quot;safe&amp;quot; (ambiguous, vague) conjuncts such as 'soshite' (and then), and also -- for what it's worth --- complained more.</Paragraph>
      <Paragraph position="4">  To be quite honest this experiment was too small in scale to allow scientific conclusions (20 people participated), but we went ahead anyway and concluded that a) the projc('t showed pronfisc, b) interface C was the way to go, e) topic/cornmeal, extraction was important, but d) it was also costly (took three times as long!) so we'd stick to the qazy' evaluation tbr further exl)erinmnts.</Paragraph>
    </Section>
    <Section position="2" start_page="1068" end_page="1068" type="sub_section">
      <SectionTitle>
5.2 Validation exi)erilnents
</SectionTitle>
      <Paragraph position="0"> For the second set of cxperhnents, we designed identical interfaces for English and Jai)anese.</Paragraph>
      <Paragraph position="1"> There was only one. question, with 6 answers, and all of these led to a screen with conjuncts to choose fl:om, never more than 8 on a screen. The set of conjuncts was designed to be minimal (no redundancies, no ambiguous conjmlcts); there were 32 of them, spread over 11 categories (el. ~ 4).</Paragraph>
      <Paragraph position="2"> An originM English texl, was chosen (A); l.hen a &amp;quot;perfi;ct&amp;quot; (but aligned) .lapanese translation was produced (B); anti finally two &amp;quot;less-than-perfect&amp;quot; translations were contrived (C was raw MT output, D was output from a tuned MT system the understandability of which had been determined by independent ext)eriments to be halfway between B and C -level 3 in (Fuji, 1996)). The sizes of the subject groups are given in table 2 between parentheses. Distribution means were computed both for categories and for conjuncts.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML