XML Viewer - p88-1019

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/p88-1019_metho.xml
Size: 17,411 bytes
Last Modified: 2025-10-06 14:12:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="P88-1019">
  <Title>EXPERIENCES WITH AN ON-LINE TRANSLATING DIALOGUE SYSTEM</Title>
  <Section position="4" start_page="0" end_page="156" type="metho">
    <SectionTitle>
SYSTEM CONFIGURATION
</SectionTitle>
    <Paragraph position="0"> A general idea of the system is illustrated in  Switzerland, and linked by a conventional satellite telephone connection. The workstations at either end were AS3260C machines. Running UNIX, they support the Toshiba Machine Translation system AS-TRANSAC. On this occasion, the Machine Translation capability was installed only at the Japanese end, though in practice both terminals could run AS-TRANSAC.</Paragraph>
    <Paragraph position="1"> The workstation screens are divided into three windows, as shown in Figure 2, not unlike in the normal version of UNIX's talk. The top window shows the user's dialogue, the middle window the correspondenfs replies. The important difference is that both sides of the dialogue are displayed in the language appropriate to the location of the terminal. However, in a third small window, a workspace at the bottom of the screen, the raw input is also displayed. (This access to the English input at the Japanese end is significant in the case of Japanese users having some knowledge of English, and of course vice versa if appropriate.) The bottom window also served the purpose of indicating to the users that their conversation partners were transmitting.</Paragraph>
    <Paragraph position="2">  \[ live in geneva, but I come froe California.</Paragraph>
    <Paragraph position="3"> /es, ~t ~hen I ~as 12 ~ars old.</Paragraph>
    <Paragraph position="4"> /ery interesting, Quick, and useful ! ~ov many languages do you spaak, Takeda ? rhet is ok. \] =- __'L., ........ --:_._-'- II MY name is Takeda. Please tell me your name.</Paragraph>
    <Paragraph position="5"> Where do YOU live?  Figure 3 shows the set-up in more detail. At the Japanese end, the user inputs Japanese at the keyboard, which is displayed in the upper window of the workstation screen. The input is passed to the translation system and the English output, along with the original input is then transmitted via telecommunications links (KDD's Venus-P and the Swiss PTT's Telepac in this case) to Switzerland. There it is processed by the keyboard conversation function, which displays the original input in the workspace at the bottom of the screen, and the translated message in the middle window on the screen. The set-up at the Swiss end is similar to that at the Japanese end, with the important exception that only the original input message is transmitted, since the translation will take place at the receiving end.</Paragraph>
  </Section>
  <Section position="5" start_page="156" end_page="156" type="metho">
    <SectionTitle>
TRANSLATION METHOD
</SectionTitle>
    <Paragraph position="0"> An input sentence is translated by morphological analyzer, dictionary look-up module, parser, semantic analyzer, and target sentence generator.</Paragraph>
    <Paragraph position="1"> Introducing a full-fledged semantic analyzer conflicts with avoiding increases in processing time and memory use. To resolve this conflict, a Lexical Transition Network Grammar (LTNG) has been developed for this system.</Paragraph>
    <Paragraph position="2"> LTNG provides a semantic framework for an MT system, at the same time satisfying processing time and memory requirements. Its main role is to separate parsing from semantic analysis, i.e., to make these processes independent of each other. In LTNG, parsing includes no semantic analysis. Any ambiguities in an input sentence remain in the syntactic structure of the sentence until processed by the semantic analyzer. Semantic analysis proceeds according to a lexical grammar consisting of rules for converting syntactic structures into semantic structures. These rules are specific to words in a pre-eompiled lexicon. The lexicon consists of one hundred thousand entries for both English and Japanese.</Paragraph>
  </Section>
  <Section position="6" start_page="156" end_page="157" type="metho">
    <SectionTitle>
SYSTEM PERFORMANCE
</SectionTitle>
    <Paragraph position="0"> Once the connection has been established, conversation proceeds as in UNIX's talk. An important feature of the function is that conversers do not have to take turns or wait for each other to finish typing before replying, unlike with write.</Paragraph>
    <Paragraph position="1"> This has a significant effect on conversational strategy, and occasionally leads to disjointed conversations, both in monolingual and bilingual dialogues. For example, a user might start to reply to a message the content of which can be predicted after the first few words are typed in; or one user might start to change the topic of conversation while the other is still typing a reply.</Paragraph>
    <Paragraph position="2"> Transmission of input via the satellite was generally fast enough not to be a problem: the real bottle-neck was the physical act of input. Novice users do not attain high speed or accuracy, a problem exacerbated at the Swiss end by a slow screen echo. But the problem is even greater for Japanese input: users typed either in romaji (i.e. using a standard transcription into the Roman alphabet) or in hiragana (i.e. using Japanese-syllable values for the keys). In either case, conversion into kanji (Chinese characters) is necessary (see Kawada et al. 1979 and Mori et al. 1983 on kana.to-kanji conversion); and this conversion is needed for between a third and a half of the input, on average (el. Hayashi 1982:211). Because of the large hum- null ber of homophones in Japanese, this can slow down the speed of input considerably. For example, even for professional typists, an input speed of 100 characters (including conversions) per minute is considered reasonable (compare expected speeds of up to 100 words/minute for English typing). It is of interest to note that this kana-to-kanji conversion, which is accepted as a normal part of Japanese word-processor usage, is in fact a natural form of pre-editing, given that it serves as a partial disambiguation of the input.</Paragraph>
    <Paragraph position="3"> On the other hand, slow typing speeds are also encountered for English input, one side-effect of which is the use of abbreviations and shorthand.</Paragraph>
    <Paragraph position="4"> In fact, we did not encounter this phenomenon in Geneva, though in practice sessions (with native English speakers) in Japan, this had been quite common. Examples included contractions (e.g.</Paragraph>
    <Paragraph position="5"> pls for please,.u for you, cn for can), omissions of apostrophes (e.g. cant, wont, dont) and non-capitalization (e.g. i, tokyo, jal).</Paragraph>
    <Paragraph position="6"> The translation time itself did not cause significant delays compared to the input time, thanks to a very fast parsing algorithm, which is described elsewhere (Nogami et al. 1988). Input sentences were typically rather short (English five to ten words, Japanese around 20 characters), and translation was generally about 0.7 seconds per word (5000 words/hour). Given users' typing speed and the knowledge that the dialogue was being transmitted half way around the world, what would, under other circumstances, be an unacceptably long delay of about 15 seconds (for translation and transmission) was generally quite tolerable, because users could observe in the third window that the correspondent was inputting something, even if it could not read.</Paragraph>
  </Section>
  <Section position="7" start_page="157" end_page="159" type="metho">
    <SectionTitle>
TRANSLATION QUALITY
</SectionTitle>
    <Paragraph position="0"> This environment was a good practical test of our Machine Translation system, given that many of the users had little or no knowledge of the target language: the effectiveness of the translation could be judged by the extent to which communication was possible. Having said this, it should also be remarked that the Japanese-English half of the bilingual translation system is still in the experimental stage and so translations in this direction were not always of a quality comparable to those in the other direction. To offset this, the users at the Japanese end, who were mainly researchers at our laboratory and therefore familiar with some of the problems of Machine Translation, generally tried to avoid using difficult constructions, and tried to 'assist' the system in some other ways, notably by including subject and object pronouns which might otherwise have been omitted in more natural language.</Paragraph>
    <Paragraph position="1"> We recognized that the translation of certain phrases in the context of a dialogue might be different from their translation under normal circumstances. For example, Engfish I see should be translated as naruhodo rather than watashi ga miru, Japanese wakarimashita should be I understand rather than I have understood, and so on.</Paragraph>
    <Paragraph position="2"> Nevertheless, the variety of such conversational fillers is so wide that we inevitably could not foresee them all.</Paragraph>
    <Paragraph position="3"> The English-Japanese translation was of a high quality, except of course where the users - being inexperienced and often non-native speakers of English - n~de typing mistakes, e.g. (I). (In these and subsequent examples, E: indicates English input, J: Japanese input, and T: translation. Translations into Japanese are not shown. Typing errors and mistranslations are of course reproduced from the original transmission.) (la) E: this moming i came fro st. galle to vizite the exosition.</Paragraph>
    <Paragraph position="4"> E: it is vwery inyteresti ng to see so many apparates here.</Paragraph>
    <Paragraph position="6"> T: What is tolike? These were sometimes compounded by the delay in screen echo of input characters, as in  example (2).</Paragraph>
    <Paragraph position="7"> (2) E: Sometimes, I chanteh the topic, suddenly.</Paragraph>
    <Paragraph position="8"> E: I change teh topic.</Paragraph>
    <Paragraph position="9">  E: But the main reason is the delay fo dispaying.</Paragraph>
    <Paragraph position="10"> E: But the main reason is the delay of display.</Paragraph>
    <Paragraph position="11">  Failure to identify proper names or acronyms often led to errors (by the system) or misunderstandings (by the correspondent), as in (3a), especially when the form not to be translated happens to be identical to a known word, as in (3b). In (3b), 'go men na sai' means in Japanese that I'm sorry.</Paragraph>
    <Paragraph position="12">  This was avoided on the Japanese-English side where proper names were typed in romaji (4).  (4) J: ~I,(c)~--~I~N o g a m i &amp;quot;C'&amp;quot;J-o  T: My name is Nogami.</Paragraph>
    <Paragraph position="13"> As with any system, there were a number of occasions when the translation was too literal, though even these were often successfully understood (5).</Paragraph>
    <Paragraph position="14">  T: I want to drink a warm coffee. E: warm coffee? E: Not a hot one? J: ,,~, v, = -- e --'e3&amp;quot;o T: It is a hot coffee.</Paragraph>
    <Paragraph position="15"> One problem was that the system must always give some output, even when it cannot analyse the input correcdy: in this environment failure to give some result is simply unacceptable. However, this is difficult when the input contains an unknown word, especially when the source language is Japanese and the unknown word is transmitted as a kanji. Our example (6) nevertheless shows how a cooperative user will make the most of the output. Here, the non-translation of tsuki mae (fi\] ~ ) is compounded by its mis-translation as a prepositional object. The first Japanese sentence said that I married two months ago. But the English correspondent imagines the untranslated</Paragraph>
    <Paragraph position="17"> T: I married to 2 ~ ~-~\].</Paragraph>
    <Paragraph position="18"> E: are married to 2 what???.</Paragraph>
    <Paragraph position="19"> J: ~-~(c)6~ tc~ l.fco T: I married in this year June. E: now i understand.</Paragraph>
    <Paragraph position="20"> E: i thought you married 2 women. In the reverse direction, the problem is less acute, since most Japanese users can at least read Roman characters, even if they do not understand them (7): this led in this case to an interesting metadialogue. Again, the English user was cooperative, and rephrased what he wanted to say in a way that the system could translate correcdy. (7) E: can you give me a crash course in japanese?.</Paragraph>
    <Paragraph position="21"> J: c r a s h c o u r s e~f~'O~ ~o T: What is crash course? E: it means learn much in a very short time.</Paragraph>
    <Paragraph position="22"> Mistransladons were a major source of metadialogue, to be discussed below, though see particularly example (10).</Paragraph>
    <Paragraph position="23"> THE NATURE OF THE DIALOGUES There has been some interesting research recently (at ATR in Osaka) into the nature of keyboard dialogues (Arita et aL 1987; Iida 1987) mainly aimed at comparing telephone and keyboard conversions. They have concluded that keyboard has the same fundamental conversational features as telephone conversation, notwithstanding the differences between written and spoken language. No mention is made of what we are calling here metadialogue, though it should be remembered that our dialogues are quite different from those reported by the ATR researchers in that we had a translation system as an intermediary. No comparable experiment is known to us, so it is difficult to find a yardstick against which to assess our findings. null Regarding the subject matter of our dialogues, this was of a very general nature, often about the local situation (time, weather), the dialogue partner (name, marital status, interests) or about recent news. A lot of the dialogue actually concemed the system itself, or the conversation. An  obvious example of this would be a request to rephrase in the case of mistranslation, as we have seen in (6) above, though not all users seemed to understand the necessity of this tactic (8).</Paragraph>
    <Paragraph position="24"> (8) E: how does your sistem work please.</Paragraph>
    <Paragraph position="25"> J: ~.L~ ~: (c)~(c),~b~ r) ~-'~-A,o T: I don't understand a meaning of the sentence.</Paragraph>
    <Paragraph position="26"> E: how does your sistem work? Often, a user would seek clarification of a misor un-translated word as in (9), or (3) above.  T: Riz is rice.</Paragraph>
    <Paragraph position="27"> The most interesting metadialogues however occurred when users failed to distinguish cited words - a problem linguists are familiar with for example by quotation marks: these would then be re-translated, sometimes leading to further  confusion (10).</Paragraph>
    <Paragraph position="28"> 0o) Jl: B~:(c)Ep~'~L.&amp;quot;C &lt; ~ Wo T: Please speak a Japanese impression.</Paragraph>
    <Paragraph position="29"> E1 : ichibana.</Paragraph>
    <Paragraph position="30"> J2: b~ ~ &amp;quot;,~ ~-A,o J3: i c h i b a n a ~1~'~';~o T:What is ichibana? E2: i thought it means number one.</Paragraph>
    <Paragraph position="31"> J4: f~ ~--~:'(-~o T:What is the first? E3: the translation to you was  incorrect.</Paragraph>
    <Paragraph position="32"> This example may need explanation. First the translation of the Japanese question (J1) has been misunderstood: the translation should have been 'Please give me your impressions of Japan', but the English user (E-user) has understood Japanese to mean 'Japanese language'. That is, E-user has understood J1 to be saying 'Please speak an impressive Japanese word.' Then E-user confused ichiban ('number 1' or 'the first') and ikebana ('flower arranging'). The word ichibana (El) does not exist in Japanese. His explanation 'number one' was correctly translated (not shown here) as ichiban. But not realizing of course that the meaning of his first sentence (J1) was incorrectly understood, the Japanese user (J-user) could not understand E1 (J2) and asked for its sense (J3). So E-user tried to explain the meaning of /C/h/bana, which in fact was ichiban. By the answer, J-user has identified what E-user ment, but since J-user still did not realized that his first sentence was incorrectly understood and hence J-user has understood E2 to be saying that something was 'number 1', he tried to ask what was 'number 1' (J4).</Paragraph>
    <Paragraph position="33"> But in the translation of this question, ichiban (--~ ) was translated as 'the fLrsf. At this point, it is not clear which comment E-user is referring to in E3, but anyway, not realizing what answer J-user have expected and not knowing enough Japanese to realize what has happened - i.e. the connection between 'number one' and 'the firsf - E-user gives up and changes the subject. If E-user had intended to speak ikebana and explained its meaning, J-user could have realized J1 had been misunderstood. Because it is meaningless in a sentence saying someone's impression that something is ikebana.</Paragraph>
    <Paragraph position="34"> On the other hand, where the user knew a litde of the foreign language (typically the Japanese user knowing English rather than vice versa), such a misunderstanding could be quickly dealt with (11).</Paragraph>
    <Paragraph position="35"> (11) E: How is the weathere in Tokyo? J:we a t h e r e i'~we a t h e r T: Is weathere weather?</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML