File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2154_metho.xml

Size: 13,593 bytes

Last Modified: 2025-10-06 14:07:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2154">
  <Title>WebDIPLOMAT: A Web-Based Interactive Machine Translation System</Title>
  <Section position="3" start_page="0" end_page="1042" type="metho">
    <SectionTitle>
2 Interface Design
</SectionTitle>
    <Paragraph position="0"> The design of the Webl)IPLOMAT system is intended to facilitate the following kind of interaction: (numbers correspond to Figure 1)  1. Speech fl'om the user is recognized and displayed in an editing window, where it may be edited by respeaking or using the keyboard. 2. When text is acceptable to the user, it is submitted tbr translation and transfer to the other  See text for explanation of labels.</Paragraph>
    <Paragraph position="1"> l)arty.</Paragraph>
    <Paragraph position="2"> 3. Text to be translated is optionally presented to a human expert, who is able to translate, correct and teach the system a correct translation. 4. Upon machine translation of tlLe text, or acceptance by the expert, a translation is delivered to the other pa.rty and synthesized.</Paragraph>
    <Paragraph position="3"> 5. 13oth sides of the conversation are tracked a.utomatically for all users, and displayed on their interfaces.</Paragraph>
    <Paragraph position="4"> Although the above is the original vision for tihe system, other configurations are easily imagined. Configurations with more than two participants, or where one of the users is also simultaneously all expert are stra.ightforwardly handled. Internationalization of the interfaces, for use in different locales, is also easily handled. Many changes of this nature are handled by easy modifications to the HTMI, code for given \Y=eb pages. More COml)licated tasks may be accomplished by modifications of underlying code. In order to produce the above configuration, the current system implements two user interthces (UIs): the Client UI, which provides speech and text input capabilities to the primary end-users of the system; and the Editor UI, which provides translation editing capabilities to a human translation expert, in the rest of this section, we describe in detail certain unique aspects of each interface.</Paragraph>
    <Section position="1" start_page="1041" end_page="1041" type="sub_section">
      <SectionTitle>
2.1 Client User Interface
</SectionTitle>
      <Paragraph position="0"> In addition to speech-input and editing capabilities, the Client UI is able to track the entire dialog as it progresses. Because the Central Communications Server (@ ~a.l) forwards every message to all connected clients, every component of the system can be aware of how the dialog turn is proceeding. Ill tile Client UI, this capability is used to l)rovide a running transcript of the conversation as it occurs. By noting the identifiers on messages (cf. ~,3.4), the U1 can assign appropriate labels to each of the following: our original utterance, translation of our utterance, other person's utterance, translation of their utterance. In ~ddition, we use knowledge about the status of the dialog to prevent the user from sending several utterances belbre the other party has responded. null</Paragraph>
    </Section>
    <Section position="2" start_page="1041" end_page="1042" type="sub_section">
      <SectionTitle>
2.2 Editor User Interface
</SectionTitle>
      <Paragraph position="0"> The F, ditor UI provides tools which make it possible for a human expert to edit translations produced by the machine translator betbre they are sent to the users. As mentioned earlier, the editing step is optional, and is intended to improve the quality of transla.tions. The Editor UI may be configured so that either of the two users, or a remote third party can act as editor. Onr motivations for providing an editing capability are twofold: * Although our MT system (@ ~3.2) dots not always produce the correct answer, the correct answer is usually available a.mong the possibilities it. considers.</Paragraph>
      <Paragraph position="1"> t.al Q * ,H~ MT system provides for online updates of its knowledge base which a.llows tbr translations to improve over time.</Paragraph>
      <Paragraph position="2"> In order to take advantage of' these capabilities, we have designed two editing tools, the chart editor and a.lways-active learning, that enable a human expert to rapidly produce an accurate tlJaillslatioll aud to store tha.t translation in the MT knowledge base for future use.</Paragraph>
      <Paragraph position="3"> As discussed in ~a.2, our MT system ma.y produce more than one translation for each part of tile input, from which it attempts to se\]ect the best translation. The entire set of translations is available to the Web-I)IPLOMAT system, and ix used in the cha.rt editor.  human edit()\]: is l)resented a. pol)Ul)-menu of alterna.tire tra.nslations beginning a.t a particular location in the sentence (see l?igure 2). When one o\[' the alternatives is sek;cted, it replaces the original word or words. In this way, a. sentence may be rapidly edited to an acceptable sta.te.</Paragraph>
      <Paragraph position="4"> In order to reduce develolmmnt \]line, our MT system can be used in a ra.pid-del)loylnent style: afl;er a. minimal knowledge base is constructed, the system is put into use with a huma.n expert supervising, so that domain-rel(:va.nt data ma.y be elicited (lui(:ldy. In order to supl)ort this, all uttera.nces a.re considered for learning. When the editor presses the 'Acccitt/Learn' l)utton, the original utterance and its tra.nslatiotl are exa.ntined to determine if they are suital)le for learning. (Turrently all utterances for which the forward tra.nslation has 1teen edited are su brat\]ted \['or learning, a.lthough other criteria ma.y also be entertained. More detail about online lea.r|&gt; ing may 1)e found ill ~3.2.</Paragraph>
      <Paragraph position="5"> Although the editor UI is primarily i\]lte\]l(led tbr use by a. tra.nslation expert, it, will sometimes also 1)e u,qed 1)y tllose who are not as expert. For this situati:)n, we ha.re introduced it lta('ktra.lisla.l.ion capalJility which retra.nsla.tos the edited forward trai/sla.tioll into the language of the input. Although i,~iperl'ect, baektranslatio\]l can often give the user an idea of whether the forward transla.tion was suits\]ant\]ally (:O\]:l:eot,.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1042" end_page="1042" type="metho">
    <SectionTitle>
3 System_ Design
</SectionTitle>
    <Paragraph position="0"> h, this section, we describe 1.he eOml)uta,l, io\]|al archit()etu r(&amp;quot; \[lllderlyin,,g the W(;b I) 11) I,O M A'I' sys|,elll.</Paragraph>
    <Paragraph position="1"> 3.1. Ar(:hite('t;m'( ~.</Paragraph>
    <Paragraph position="2"> The underlyil\]g arel\]itecture of the \Y=obl)II)I,OMAT ' system is shown in Figure 3. The system is organized  oh.jeers sent to this server are forwarded to all connected clients. With the exception of speech and HTTP, all communications between clients use this server.</Paragraph>
    <Paragraph position="3"> The servers are designed to be small, and a.re in~ tended to coexist on one lnachine. 1 Currently, however, the speech server inchides a full speech recogl This is necessary due to security restrictions on .\]~twt 'I'M Applets.</Paragraph>
    <Paragraph position="4"> nizer, a.nd therefore consunies a greater amount o1' resources than the other servers.</Paragraph>
    <Paragraph position="5"> Most processing is intended Co be perforumd by clients, which haw~' no loca.lity requirements, and may therefore I)e distributed across nm.chi\]les and networks as necessary. The User and Editor Clients were described in {iSS2.1 and 2.2. We will now examine the most important l~rocessing mechanisms, ilmluding machine translation and speech recognition/synthesis. null</Paragraph>
    <Section position="1" start_page="1042" end_page="1042" type="sub_section">
      <SectionTitle>
3.2 Machine Translation
</SectionTitle>
      <Paragraph position="0"> l&amp;quot;or Machine Transla.tion, we rely on the l)anlite M|dti-lgl\]gine Machine Translation (MEMT) Server (l:rederking a.nd lh:own, 1996). This system, which is outlined in Figure 4, makes use of several translation engines at once, combining their output with a. sta.tistica\] language model (Brown and l:rederking, 1995). Each traiisla.tion engine makes use of a dill'ere|tt transla.tion technok)gy, and produ(:es multit)1% possibly overlal)ping , l.ra\]mlations for every part of tit(; inl)ut that it can translate. All of the translations I)roduced 1)3: the various engines a,re pla.ced in a chart data struci;ure (Kay, 1967; Winograd, 1983), indexed by the'Jr position i\]\] the input uttera.nce. A statistical huiguage model is used, together with scores provided I)y the tra.nslation engines, to determine the optima.l path through the set of translated segments, which informa,tion is also stored i\]\] the chart. Upon completion of tra.nslation, the chart data struct||re is made a.vailable For use by the rest</Paragraph>
      <Paragraph position="2"> (;urrently, we enq)loy l,exica.l Transfer and Ex- null bilingual dictionaries and phrasal glossaries to provide phrase-for-phrase translations, while EBMT uses a fllzzy matching step to produce translations froln a bilingual corpus of matched sentence pairs. Because the knowledge bases for these techniques are simple, they both suI)port online augmentation. As mentioned in SS2.2, the Editor UI attempts to learn from utterances that have been edited. Pairs of utterances submitted for learning to the translator are placed in a Lexical Transfer glossary if less than six words long, and in an EBMT corpus if two words or longer. Higher scores are given to these newly created resources, so that they are preferred.</Paragraph>
      <Paragraph position="3"> The MT server is interfa.ced to the Central Server through MT interfa.ce clients, which handle, inter alia, character set conversions, support for learning and conversion of MT output into an internal object representation usable by other clients. It also ensures that outgoing translations are staml)ed with correct identifiers (cf. ~3.4), relative to the incoming text, to ensure that translations are directed to the appropriate clients.</Paragraph>
    </Section>
    <Section position="2" start_page="1042" end_page="1042" type="sub_section">
      <SectionTitle>
a.a Speech Recognition and Synthesis
</SectionTitle>
      <Paragraph position="0"> In the current system, speech recognition is handled as a private communication between a browser plugin, running on the user's machine, and a speech recognition server, and is not routed through the central server. Speech is streamed over the network to the server, which performs the recognition, and returns the results as a text string. This configuration permits most of the computational resources to be offloaded from the client machine onto powerful remote servers. The speech may be streamed over the network as-is, or it may be lightly preprocessed into a feature stream for use over lower-bandwidth connections. The recognized text is returned diArchitecture null rectly to tile user client for editing and validation by the user belbre heing sent for translation. Our speech server is a previously implemented design (Issar, 1997) based on the Sphinx II speech recognizer (Huang et a l., 1992). As mentioned earlier, the speech server and recognizer are not currently designed to run in a distributed fashion.</Paragraph>
      <Paragraph position="1"> Unlike speech recognition, which is handled by the User Client, speech synthesis does not require human interaction, and can therefore be connected directly to the central server. Currently, Synthesizer Interfaces unpackage internal representations and send utterances to be synthesized on a speech synthesizer running locally on the user's machine.</Paragraph>
      <Paragraph position="2"> Future plans call for speech to be synthesized at a central location and transported across the net.work in standard andio formats.</Paragraph>
    </Section>
    <Section position="3" start_page="1042" end_page="1042" type="sub_section">
      <SectionTitle>
3.4 Implementation
</SectionTitle>
      <Paragraph position="0"> All components of the Webl)IPLOMA'\]' except the speech components and Web Server were implemented in Java TM (Gosling et el., 1996), inclnding the Central Server. Messages between clients are implemented as a Java class Capsule, containing a String identifier and any number of data. Objects.</Paragraph>
      <Paragraph position="1"> Object serialization permits simple implementation of message streams. User Interface clients are developed as Applets, which are embedded in HTML pages served by the Web Server.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1042" end_page="1044" type="metho">
    <SectionTitle>
4 Future Work and Conclusion
</SectionTitle>
    <Paragraph position="0"> The most significant change we would like to make to the current system is the way that speech is handled. We firmly believe that the best speech input device is the one people are already familiar with, namely the telephone. A revised system would allow users to call specific phone numbers (connected to the central server) in order to access the system, which would then recognize and synthesize speech  over tile telephone line while still using web-based interfaces. This, of COtlrse, takes us closer to the grand AI Challenge of the translating telephone (OAIAE, 1996; Kurzweil, 1999; Frederking et al., 1999). We contend that by using interactive machine translation, the goal of a broad-domain translating telephone Call be more easily brought to fruition.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML