File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2115_metho.xml
Size: 17,230 bytes
Last Modified: 2025-10-06 14:12:07
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-2115"> <Title>Machine Translation: The Languages Network (versus the intermediate language.)</Title> <Section position="2" start_page="0" end_page="544" type="metho"> <SectionTitle> VM/CMS mainframes). 1 The goal of the intermediate language. </SectionTitle> <Paragraph position="0"> In discussions on translation systems, the question is often asked whether the system is based on direc |translation, whether it works according to |he transfer method, or whether it uses an intermediate language. &quot;\['his question suggests that translation systems can he defined exactly by dividing them into these three categories.</Paragraph> <Paragraph position="1"> It is our conviction that solving problems in translation is more complex than is suggested by this question. In this paper we will show precisely where this question is inadequate, by looking at some aspects of the translation process.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Why an intermediate language? </SectionTitle> <Paragraph position="0"> The idea which leads to the definition of an intermediate language originates from the wish that, in translation from one source language into several target languages, no completely different translation should have to he made for each pair of languages.</Paragraph> <Paragraph position="1"> Brandt Corstius /Brandt Corstius, 1978/expresses this idea as follows (the citation has been translated from Dutch into English): &quot;Instead of making 90 programs in order to translate ten languages into one another (from each language into each of the nine others), it would be sufficient to have 20 translation programs (from each language into Maehinish, and fi'om Machinish into each of the languages). It is even conceivable that eighteen progrmns would be sufficient, if one of the ten languages is given the role of this intermediate language.&quot; This standpoint implies an &quot;efficient&quot; method in terms of the amount of work, without indicating whether this method solves also any principal problems with respect to machine translation.</Paragraph> <Paragraph position="2"> On the contrary, Brandt Corstius remains sceptical about this.</Paragraph> </Section> <Section position="2" start_page="0" end_page="544" type="sub_section"> <SectionTitle> 1.2 What shall be defined? </SectionTitle> <Paragraph position="0"> Ideally the intermediate language will have to be an unambiguous representation of the meaning of (each of) the source language(s).</Paragraph> <Paragraph position="1"> This implies that a language should be found in which it is possible to represent all possihle meanings in an unanfl)iguous way. If Brandt Corstius is followed in this, then in the translation h'om and into natural language this would have to he one of the natural languages. But one of the main characteristics of natural language is precisely that it is efficient, which implies that with few words and constructions a lot can be expressed in very many different circumstances. null The way in which each individual natural hulguage is etFlcient differs from language to hmguage: ambiguities and vagueness in a source language cannot usually be projected in a one-to-one col'respondenee onto a target language.</Paragraph> <Paragraph position="2"> So it would seem to be not entirely plausihle to select an intermediate language from the languages to he translated, seeing that tt~ demand of unambiguity is too heavy precisely for natural language. An intermediate language should not only be unaml)iguous, hut it should also be able to represent all possible meanings. We, are convinced that for every sentence of a natural language an imfinite set of meanings is possible, since meaning depends on the universe of discourse and the set of possible universes of discourse is infinitely large.</Paragraph> <Paragraph position="3"> All in all there is enough cause for a fundamental approach to the prohlem: what is to be achieved in defining an intermediate language in the maclfine translation of a set of natural languages from and into each of the menfl)ers of that set? The need for an intermediate language originates, on the one hand, from the idea that the analysis of a source language will be largely the same, irrespective of the target language into wieh it is to be translated, and on the other hand, from the need to analyse the source language in such a way that all ambiguities have been solved, and that therefore the generation of the target language can take place without any further prohtems.</Paragraph> <Paragraph position="4"> With respect to the former, we, too, believe that the idea that the analysis of a source language is partly the same, irrespective of the selected target language, is entirely correct.</Paragraph> <Paragraph position="5"> When we plot the translation process on a line from source language to target language, this will be the part which is close to the source language: to put it in rather more linguistic terms: the morphological analysis and that part of the other syntactic analysis that can be smmned up in the term surface grammar, so in any case the NP and VP detection, for instance. We shall return to this below.</Paragraph> <Paragraph position="6"> The second need, viz. completely disambiguating the source language, would seem to be too heavy a demand, as was formulated before, with reference to ~all possible universes of discourse'. The two needs that have been mentioned cannot be fulfilled, but it can be maintained that there is no need for the entire analysis of the source language and the entire generation of the target language to be done over and over again for every pair of languages. It has to be deternfined what the two paths, analysis and generation, will look like. Parts of these two paths will renmin the stone, for the source language irrespective of the target language, and for the target language irrespective of the source language.</Paragraph> </Section> </Section> <Section position="3" start_page="544" end_page="544" type="metho"> <SectionTitle> 2 The analysis of the source language. </SectionTitle> <Paragraph position="0"> Since there is no reason to adopt an intermediate language as has been argued, the problem facing us is the analysis of the source language, as well as the generation of the target language and the process between analysis and generation, which will be discussed in the following two sections.</Paragraph> <Paragraph position="1"> The point of departure for both these sections, and, in fact, for this paper is the hypothesis that tbe meaning of the source language depends on the objective one has in mind. In the case of machine translation the meaning, expressed in the translation, depends on the target language defined.</Paragraph> <Paragraph position="2"> In this line ot arguing translating i.'~ therefore always a matter of a specific relat:ionship between two lauguages. When we plot the process of translation on a line, however~ we can distinguisb three phases, which can he referred to as analysis, translation and generation. The present and the next following two sections have been divided on the basis of this principle.</Paragraph> <Section position="1" start_page="544" end_page="544" type="sub_section"> <SectionTitle> 2.1 What is analysis? </SectionTitle> <Paragraph position="0"> The analysis or a source language can be defined, in a very abstract way~ as the addition of information to the input. It may seem trivial, but this st;u'ting point implies that no information must be lost in the analysing stage, lnlbrmation may only be added. This also implies that the input order must not be changed. So, in our view, a dependency grammar is not suitable for the analysis because of the loss of the input order.</Paragraph> <Paragraph position="1"> Changes in the input order can only be brought about on the basis of requirements posed by the target langnage.</Paragraph> <Paragraph position="2"> This brings us to a second aspect of the analysing stage: in this stage only source language inherent data are worked with, to be subdivided into static (lexical) and dynaufic (granmmtical) data.</Paragraph> <Paragraph position="3"> \]'he fact that in {:he analysing stage solely data inherent to the source language are used, does not mean that the target language has notb~ ing to do with the nature of the analysis. That would clash with our starting point, viz. that the meaning of the source language depends on the target language.</Paragraph> <Paragraph position="4"> The influence of the (set of) target language(s) extends over the way it1 which the analysis is carried out, in other words, what type of information has to be added to the text of the source language.</Paragraph> <Paragraph position="5"> Let us take, by way of example, the translation Dutch-English. We will assume (for convenienee's sake) that in Dutch the word order in subelauses is S(ubject)-O(bject)-V(erb), while in English the standard word order in subclauses is S(ubject)-V(erh)-O(bject).</Paragraph> <Paragraph position="6"> The change fr()m S-O-V into S-V-O does not belong to the analysing stage of Dutch, lbr it implies lnss of infornmtiou because of the word order cbange. ~\]owever, the translation into English requires fron~ the analysing stage of Dutch that, among other things, the categories S~ 0 and V are a~;signed. The assignment of S and 0 implies that NP's have to be ibn:od, etc.</Paragraph> <Paragraph position="7"> The analysing stage comprises all stages which belong to morphof ogy~ surface grammar and possibly a large number of matters that belong to the field of semazatic interpretation (see /Bakel, 1984/).</Paragraph> <Paragraph position="8"> This last category, semantic interpretation, is close to the translation stage and will possibly be different for groups of target languages.</Paragraph> <Paragraph position="9"> in the section headed 'Prospect' we will indicate schematically how this semantic interpretation has to be situated in the whole of the translation process.</Paragraph> </Section> <Section position="2" start_page="544" end_page="544" type="sub_section"> <SectionTitle> 2.2 Algorithmic consequences. </SectionTitle> <Paragraph position="0"> .~;tar ting ii'om the assumption that the addition of information (committing abstractions) is brought about, anmng other things, by the application of some sort of dependency structure, what is needed is a form of graphic representation.</Paragraph> <Paragraph position="1"> For many languages, and certainly also for Dutch, the traditional tree structure clashes with our demand formulated earlier, that information should be retained: the original word order nmst be nmintained during the anMysing stage. (In Dutch a postmodifier in an NP is often extraposed, e.g. &quot;Ik heb de nmn gezien met de bril.&quot; \]Y=ans\[ated word by word: &quot;I have the man seen with the spectacles.&quot;) \],hn'thermore linguists should have the possibility of expressing linguistic notions in a way which is adequate to them. For this purpose a distinction has been made, in the SYGMART system, between the morphololgical analysis, which operates on words (the subsystem OPALE) and a tree transformational part (the subsystem TELESI), whicil ol)eral.es on nndti-dimensional trees over text(s) (the notion sentence does not exist in SYGMART).</Paragraph> <Paragraph position="2"> The surface gremlmar and the semantic interpretation cannot therefore be algorithmically distinguished.</Paragraph> <Paragraph position="3"> This multi--dimensionality enat)les the linguist to establish relationships between sentence constituents which are far apart, without having to extract them out of their original order. This multidimensionality has to be looked upon as'the definition of graphs more complex than trees over tbe input. For a more detailed discus.</Paragraph> <Paragraph position="4"> sion of the mnltl-dimensionMity the reader is referred to/Chauch6, 1984/.</Paragraph> <Paragraph position="5"> Our arguulents for not using the traditional grmmnatical types are given in /Roll, 1986/. Part of the analysis of Dutch is shown in Appendix A.</Paragraph> </Section> </Section> <Section position="4" start_page="544" end_page="544" type="metho"> <SectionTitle> 3 The t'eanslation. </SectionTitle> <Paragraph position="0"> The translation stage is the stage between the source language inherent analysis and the target language inherent generation. This stage can be roughly compared to a transfer component, as suggested in the beginning of section /.</Paragraph> <Paragraph position="1"> Two features arc characteristic for the translation, viz. word order change and the addition of target language features.</Paragraph> <Paragraph position="2"> Word order change(s) (better: conlpouent or category nlovement) is (are) not per definition carried out separately for all possible target languages. If in the example of the word order change in subclauses, presented in the previous seclion, the rule SOV ---) SVO has to be al)plied to a subset of the target languages, tbis can be done tbr the entire sul)set prior to the introduction of target language specific features. null This introduction of target language specific features is clone by the lexical translation, or the translation of the words. We will assume here that the analysing stage has provided all the necessary informa- tion to gel: the correct translation for every word.</Paragraph> <Paragraph position="3"> Because of the information added in the analysing stage the correct translation of a word implies the translation of a complex data strut-' ture into another complex data structure, in which the written base forin in both cases is hut one wdue of that data structnre.</Paragraph> <Paragraph position="4"> On the basis of information that comes in after the lexical translation, further word or.der changes will generally have to take place, as well as the generation of grannnatical structures.</Paragraph> <Paragraph position="5"> A simple exaulple in this connection is the folh)wiug: the I)utrh verb hlijven is translated into English keel) , but in Dutch blijven is completed by an infinitive (blljven wachten), whereas in English a gerund is expected (keep waiting).</Paragraph> <Paragraph position="6"> If on the basis of the new lexical information further grannnatieal rules have to be applied, such as nmving the verb in the gerund construt(ion (Dutch &quot;ik blljf op hem waehten&quot; into Englisb &quot;l keep waiting for hiln'), these rules also belong to the translation stage, not to the generation stage, unless the rules apply to all possible source languages with respect to English.</Paragraph> <Paragraph position="7"> As in the previous section, here, too, the demand made of the algorithmic procedures and the possibility of building and manipulating conlptex datastructures is heavier than in traditional gramlnatical types. Within SYGMART the subsystem TELESI is used for the translation stage, which does not imply that for each pair of languages a separate TELESI implementation has to be made after all: SYGMART provides for the application of different TELESI gram-n|ars one after another.</Paragraph> </Section> <Section position="5" start_page="544" end_page="544" type="metho"> <SectionTitle> 4 The generation of the target language. </SectionTitle> <Paragraph position="0"> 15-om the previous sections it has already beconle apparent that the generation of the target language does not come into play, until only target language inhdrent matters are at issue s matters which hold irrespective of the source language that is used.</Paragraph> <Paragraph position="1"> They are in arty case all matters of a morphological nature which in the entire translation process are the last to be dealt with. In the translator generator SYGMART the generating morphology is treated by the sul)systen) .AGATE.</Paragraph> <Paragraph position="2"> If there are grammatical rules which also have to be applied independently of the source language, they also belong to the generating stage, in our set-up. These rules will not be many, for that would imply that in a target lauguage certain constructions should occur for which in no (source) language an analogous constrnction was to be found.</Paragraph> <Paragraph position="3"> In our set-up the technical three-way division of SYGMART (OPA-LE, string into tree, TELESI tree into tree via network, AGATE, tree into string) cannot be measured in a one-to-one correspondence onto the three-way division of the translation process, viz. analyses, Iranslation and generation. The morphological analysis always takes place in OPALE and is a, rather small, part of the entire analysis. The greatest part of the analysis, consequently, takes place in TELES1. Everytlfing belonging to the translation hapI)ens in TELESI. As far as the generation is concerned, a small part is possibly carried out in TELESI, but the morphological generation, naturally, takes place in AGATE.</Paragraph> </Section> class="xml-element"></Paper>