XML Viewer - h93-1038

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1038_metho.xml
Size: 14,522 bytes
Last Modified: 2025-10-06 14:13:24
<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1038">
  <Title>An MAT Tool and Its Effectiveness</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PANGLOSS MARK I translates from Spanish into English, al-
</SectionTitle>
    <Paragraph position="0"> though additional source languages are planned. The analyzer used in this configuration is a version of the ULTRA Spanish analyzer from NMSU\[2\], while generation is carried out by the PENMAN generator from ISI\[4\]. The Translator's Work- null of New Mexico State University (NMSU), and the Information Sciences Institute of the University of Southern California (ISI).</Paragraph>
    <Paragraph position="1"> station provides the user interface and the integration platform. It is similar in spirit to systems such as the Translator's Workbench\[3\].</Paragraph>
    <Paragraph position="2"> The processing in PANGLOSS goes as follows:  1. an input passage is broken into sentences; 2. a fully-automated translation of each full sentence is attempted; if it fails, then 3. a fully-automated translation of smaller chunks of text is  attempted (currently, these are noun phrases); . the material that does not get covered by noun phrases is treated in a &amp;quot;word-for-word&amp;quot; mode, whereby translation suggestions for each word (or phrase) are sought in the system's MT lexicons, an online bilingual dictionary, and a set of user-Supplied glossaries; . The resulting list of translated noun phrases and translation suggestions for words and phrases is displayed in a special editor window, where the human user finalizes the translation.</Paragraph>
    <Paragraph position="3"> This entire process can be viewed as helping a human translator, by doing parts of the job automatically and making the rest less time-consuming.</Paragraph>
    <Paragraph position="4"> We have designed and implemented an intelligent post-editing environment, the CMAT (Component Machine-Aided Translation) editor.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="196" type="metho">
    <SectionTitle>
2. The User's View
</SectionTitle>
    <Paragraph position="0"> The CMAT editor allows the user to move, replace or delete output text elements, called components, with at most two mouse actions. The main user interface tool is a dynamicallychanging popup menu available for each component. The ordering of alternate selections in the menus changes as the tool is used, to reflect the most recent user choices.</Paragraph>
    <Paragraph position="1"> Suppose the user selects a region of source text by highlighting it and submits it to be machine-translated. The result appears in a target window as a string of components, each  surrounded by &amp;quot;&lt;&lt;&amp;quot; and &amp;quot;&gt;&gt;&amp;quot; characters. 2 A mouse click anywhere within a single component brings up a CMAT menu for that component. In Figure 1, the user has clicked on the word &amp;quot;increase&amp;quot;. A CMAT menu consists of three regions, each separated by a horizontal line. From top to bottom these are: * The LABEL region, which contains the word or phrase in the source text that produced this particular component) * The FUNCTION region, which contains the post-editing Move, Delete, Modify, and Finish functions. When the user selects Move, the component disappears, and the mouse pointer changes shape, indicating that a Move is in progress. The component is reinserted into the text at the nearest word break to the point where the user clicks the mouse again. Delete simply deletes the component.</Paragraph>
    <Paragraph position="2"> Modify pops up a window that allows the user to type in a new alternative (see next bullet). Finish removes the component markers, indicating that CMAT editing for this component is finished. 4 * The ALTERNATIVE region contains alternative translations of the source word or phrase. The source word or phrase is also present as an alternative, when available, as translators may wish to leave some source language words temporarily in the target text, and return to them later. Selecting one of the alternatives replaces the original selection for this component with the alternative, while the latter becomes an alternative in the alternative  An additional menu-base editing feature allows the user to change the morphology of a word with a single mouse action (Figure 2). This menu changes verb inflection or the determiner on a noun phrase, stripping any old morphological features before adding the new one.</Paragraph>
    <Paragraph position="3"> Using these popup menus, the user can move, replace, modify, or delete an output component with one or two mouse actions, rapidly turning the string of translated words and phrases into a coherent, high-quality target language text. Note that the user is not forced to use the CMAT editor at any particular time. Its use can be intermingled with other translation activities, according to the user's preferences.</Paragraph>
  </Section>
  <Section position="5" start_page="196" end_page="197" type="metho">
    <SectionTitle>
3. The CMAT Editor
</SectionTitle>
    <Paragraph position="0"> As part of the TW$, the CMAT editor is implemented in Common LISP. It communicates through CLM (the Common LISP-Motif interface)J1\] to use Motif widgets inside of the X11 window system.</Paragraph>
    <Paragraph position="1"> The CMAT editor views text as a list of components. These  components are of three types:  1. M\[T-generated strings. Phrases translated by the MT system are represented simply as the generated target language string, and are not further processed by CMAT. 2. Glossary entries. Phrases not translated by the MT sys null tem, but found in the user glossaries, are each represented by a component list, a list containing the source string (source language phrase), the identifier : GLOSS, and a glossary entry list: a list of the possible target language phrases corresponding to the source language phrase.  3. Dictionary entries. Words not covered by- either of the  above are represented by a component list containing the source string, the identifier :M? and a target language string list: a list of the corresponding target language words as found in the MT system's lexicons; and finally the identifier : DICT and a dictionary entry list: a list of target language words found in the machine-readable dictionary.</Paragraph>
    <Paragraph position="2"> The CMAT editor uses a knowledge base and a working memory. The knowledge base stores static information for a component's menu, while the working memory provides a mapping between the knowledge base and the components currently present in the target buffer. This separation is necessary because any given component generally occurs more than once in a given text, but there is only one menu associated with a particular component.</Paragraph>
    <Paragraph position="3"> Knowledge base structures are indexed by their component source strings. These structures contain four slots, one slot each for :GLOSS, :NT, and :DICT lists, plus a fourth slot containing the candidate list. This list is a union of the first three lists, with the elements' positions varying to reflect current estimates of their likelihood of being chosen by the user. Initially, the items from the target language string list appear first in the list and glossary entries appear second, since these items are more likely to be the correct translations of a source string in our domain.</Paragraph>
    <Paragraph position="4"> When a component list is passed to the CMAT editor to be displayed, the latter first checks to see if a structure for the component already exists in the knowledge base. If an entry does not exist, one is created. Then the first component is chosen from the candidate list and displayed with brackets in the editor window. In the working memory, a pointer to the knowledge base entry is stored, indexed by the displayed component.</Paragraph>
    <Paragraph position="5"> When the user clicks the mouse within a CMAT component, the system must use the actual character string as the index into the working memory, and from there get the index into the knowledge base. 5 The list of alternative translations for the component can then be obtained from the knowledge base structure.</Paragraph>
    <Paragraph position="6"> If a component is Moved in the editor window, nothing changes in the internal representation of the CMAT editor. When a component is Deleted, the pointer in the working memory is removed. If an alternative translation is chosen from the candidate list, the old component is replaced with a new component in the CMAT editor. The pointer in the working memory is removed from its old location and stored under the new component. The new candidate is also moved to the front of the candidate list as the most likely candidate for future use. When a component is Modified, the new alternative entered by the user is stored in the knowledge base, and then treated as if it had just been chosen.</Paragraph>
    <Paragraph position="7"> When the component's markers are removed, either singly or en masse, the component's pointer in the working memory is removed, but the entry in the knowledge base remains. These are retained in order to provide a summary of the user's preferences, for the frequent case where future translations contain these components. This summary can be saved as a file, which can be loaded into the knowledge base in a later editing session, or analyzed by system developers.</Paragraph>
  </Section>
  <Section position="6" start_page="197" end_page="199" type="metho">
    <SectionTitle>
4. The Evaluation of the Tool
</SectionTitle>
    <Paragraph position="0"> In order to evaluate the effectiveness of this tool, we compared editing with the CMAT editor versus editing with just the basic Emacs-like text editor in which it is embedded. We conducted two experiments comparing CMAT and non-CMAT editing efficiency, one using monolinguals and one using translators.</Paragraph>
    <Section position="1" start_page="197" end_page="199" type="sub_section">
      <SectionTitle>
4.1. Experiment I
</SectionTitle>
      <Paragraph position="0"> Method. The monolingual task was to take the output of the MT system and, using as reference an English translation that was previously produced manually, produce the &amp;quot;same&amp;quot; text using either the CMAT editor or the regular editor. The time required for each text-editing session was recorded. Keystrokes and mouse actions were automatically counted by the interface. null As test texts, we used two of the texts from the 1992 DARPA MT evaluation. To shorten total experiment time and provide a reasonable number of sample texts, we broke each text into two halves of roughly equal size, at paragraph breaks, resulting in four text samples.</Paragraph>
      <Paragraph position="1"> Two subjects were presented with the samples in the same order. Their use of the CMAT or the plain Emacs editor on SThis is due to details of the CLM interface, and is the reason for marking identical components that have different internal data structures with a colon and an integer: otherwise there would be no way to locate the correct associated data structure.</Paragraph>
      <Paragraph position="2">  different samples was arranged to provide as much variation as possible in practice effects and acclimatization, so that these could be cancelled out during analysis. A few days later, subjects repeated the procedure, reversing the use or non-use of the CMAT editor. Since practice effects should be more uniform in a simple editing task than in translation (the task is much less intellectually challenging), we felt that texts could be reused if practice effects are taken into account in analysis.</Paragraph>
      <Paragraph position="3"> Subjects were instructed to produce a &amp;quot;close paraphrase&amp;quot; of the example translation, since any two translators will produce slightly different correct translations of the same text. Subjects were also instructed not to use the CMAT Modify function, since it causes the editor to learn during use, making analysis even harder.</Paragraph>
      <Paragraph position="4"> Analysis. Given the above ordering of test runs, one can balance practice effects, subject differences, and text differences simply by normalizing the total editing times for a subject on each run through the texts. That is, if we divide the editing time for each text by the total time for the entire set of texts in the given run, the variation between normalized editing times between subjects should reflect variations in the efficiency of editing. For example, in Figure 3, we see that for Session 1, Subject I spent a greater fraction of time using CMAT (0.2413) than Subject 2 spent editing it in a regular editor (0.2198), while for Session 2, the fraction of total time was the same with either editor.</Paragraph>
      <Paragraph position="5">  quite helpful. It could be the case that the CMAT makes the job easier without making it faster, but we had the definite impression that it makes translating faster as well as easier. We therefore investigated further.</Paragraph>
      <Paragraph position="6"> Normalized keystroke and mouse action counts are shown in Figures 4 and 5. Here we see that while the CMAT editing sessions had 1/2 to 1/3 the number of keystrokes, they had between 2 and 9 times as many mouse operations. This is significant, since mouse actions are slower than keystrokes.  From comparing these normalized times, it appears that the CMAT actually slows subjects down. This contradicts the universal subjective impression of all CMAT users that it is  tional information available to translators, and that measure any trade-off between quality and speed of translation.</Paragraph>
      <Paragraph position="7"> In the second experiment, the normalized total-edit time ratios between the two texts for Subject 5 were essentially identical to the rough draft ratios, indicating that this ratio is indeed a good indicator of the relative difficulty of the two passages. It is interesting to note that Subject 4, whose data point had to be thrown out because his CMAT times were twice the length of his non-CMAT times, corresponds closely to the level of familiarity our translators had with the CMAT editor in the first MT evaluation in 1992. An important part of our preparation for the 1993 MT evaluation will be training the test subjects in the most efficient use of our tools.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML