File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0502_intro.xml

Size: 3,085 bytes

Last Modified: 2025-10-06 14:00:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0502">
  <Title>Task Tolerance of MT Output in Integrated Text Processes</Title>
  <Section position="2" start_page="0" end_page="9" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Issues of evaluation have been pre-eminent in MT since its beginning, yet there are no measures or metrics which are universally accepted as standard or adequate. This is in part because, at present, different evaluation methods are required to measure different attributes of MT, depending on what a particular stakeholder needs to know (e.g., Arnold 1993). A venture capitalist who wants to invest in an MT start-up needs to know a different set of attributes about the system than does a developer who needs to see if the most recent software changes improved (or degraded) the system. Users need to know another set of metrics, namely those associated with whether the MT system in situ improves or degrades the other tasks in their overall process. Task-based evaluation of this sort is of particular value because of the recently envisioned role of MT as an embedded part of production processes rather than a stand-alone translator's tool. In this context, MT can be measured in terms of its effect on the &amp;quot;downstream&amp;quot; tasks, i.e., the tasks that a user or system performs on the output of the MT.</Paragraph>
    <Paragraph position="1"> 'The assertion that usefulness could be gauged by tasks to which output might be applied has been used for systems and for processes (JEIDA 1992, Albisser 1993), and also particular theoretical approaches (Church and Hovy 1991). However, the potential for rapidly adaptable systems for which MT could be expected to run without human intervention, and to interact flexibly with automated extraction, summarization, filtering, and document detection calls for an evaluation method that measures usefulness across several different downstream tasks.</Paragraph>
    <Paragraph position="2"> The U.S. government MT Functional Proficiency Scale project has conducted methodology research that has resulted in a ranking of text-handling tasks by their tolerance to MT output. When an MT system's output is mapped onto this scale, the set of tasks for which the output is useful, or not useful, can be predicted. The method used to develop the scale can also be used to map a particular system onto the scale.</Paragraph>
    <Paragraph position="3"> Development of the scale required the identification of the text-handling tasks members of a user community perform, and then the development of exercises to test output from several MT systems (Japanese-to-English). The level of ease users can perform these exercises on the corpus reflects the tolerance that the tasks have for MT output of varying quality. The following sections detail the identification of text-handling tasks, the evaluation corpus, exercise development, and inference of the proficiency scale .from the apparent tolerance of the downstream text-handling tasks.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML