File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/84/p84-1116_concl.xml
Size: 26,825 bytes
Last Modified: 2025-10-06 13:56:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1116"> <Title>Machine Translation: its History, Current Status, and Future Prospects</Title> <Section position="8" start_page="555" end_page="559" type="concl"> <SectionTitle> GETA </SectionTitle> <Paragraph position="0"> As discussed earlier, the Groupe d'Etudes pour la Traduction Automatique was formed when Grenoble abandoned the CETA system. In reaction to the failures of the interlingua approach, GETA adopted the transfer approach. In addition, the former software design was largely discarded, and a new software package supporting a new style of processing was substituted. The core of GETA is composed of three programs: one converts strings into trees (for, e.g., word analysis), one converts trees into trees (for, e.g., syntactic analysis and transfer), and the third converts trees into strings (for, e.g., word synthesis). The overall translation process is composed of a sequence of stages, wherein each stage employs one of these three programs.</Paragraph> <Paragraph position="1"> One ot the features of GETA that sets it apart from other MT systems is the insistence on the part of the designers that no stage be more powerful than is minimally necessary for its proper function.</Paragraph> <Paragraph position="2"> Thus, rather than supplying the linguist with programming tools capable of performing any operation whatever (e.g., the arbitrarily powerful Q-systems of TAUM), GBTA supplies at each stage only the minimum capability necessary to effect the desired linguistic operation, and no more. This reduces the likelihood that the linguist will become overly ambitious and create unnecessary problems, and also enabled the programmers to produce software that runs more rapidly than would be possible with a more general scheme.</Paragraph> <Paragraph position="3"> A &quot;grammar&quot; in GETA is actually a network of subgrammars; that is, a grammar is a graph specifying alternative sequences of applications of the subgr---,-rs and optional choices of which subgra~mars are to be applied (at all). The top-level grammar is therefore a &quot;control graph&quot; over the subgrm, m-rs which actually effect the linguistic operations -- analysis, transfer, etc.</Paragraph> <Paragraph position="4"> GETA is sufficiently general to allow implementation of any linguistic theory, or even multiple theories at once (in separate subgrammars) if such is desired. Thus, in principle, GETA is completely open-ended and could accommodate arbitrary semantic processing and reference to &quot;world models&quot; of any description.</Paragraph> <Paragraph position="5"> In practice, however, the story is more complicated. In order to increase the computational flexibility, as is required to take advantage of substantially new linguistic theories, especially &quot;world models', the underlying software would have to be changed in many various ways.</Paragraph> <Paragraph position="6"> Unfortunately, it is written in IBM assembly language, making modification extremely difficult.</Paragraph> <Paragraph position="7"> Worse, the programmers who wrote the software have long since left the GETA project, and the current staff is unable to safely attempt significant modification. As a result, there has been no substantive change to the GETA software since 1975, and the GBTA group has been unable to experiment with any new computational strategies. Back-up, for example, is a known problem \[Tsujii, personal communication\]: if the GETA system &quot;pursues a wr6ng path&quot; through the control graph of subgr~mmars, it can undo some of its work by backing up past whole graphs, discarding the results produced by entire subgr---,-rs; but within a subgr-mm-r, there is no possibility of backing up and reversing the effects of individual rule applications. The GETA workers would like to experiment with such a facility, but are unable to change the software to allow this.</Paragraph> <Paragraph position="8"> Until GETA receives enough funding that new progra~mers can be hired to rewrite the software in a high-level language, facilitating present and future redesign, the GETA group is &quot;stuck&quot; with the current software, now 10 years old and showing clear signs of age, to say nothing of non-transportability.</Paragraph> <Paragraph position="9"> GETA seems not to have been pressed to produce an application early on, and the staff was relatively &quot;free&quot; to pursue research interests. Until GETA can be updated, and in the process freed from dependence on IBM mainframes, it may never he a viable system. The project staff are actively seeking funding for such a project. Meanwhile, the French goverr=nent has launched an application effort through the GETA group.</Paragraph> <Section position="1" start_page="556" end_page="557" type="sub_section"> <SectionTitle> SUSY - Saarbruecker Uebersetzungssystem </SectionTitle> <Paragraph position="0"> The University of the Saar at Saarbruecken, West Germany, hosts one of the larger MT projects in Europe, established in the late 1960&quot;s. After the failure of a project intended to modify GAT for Russian-German translation, a new systsm was designed along somewhat similar lines to translate Russian into German after &quot;global&quot; sentence analysis into dependency tree structures, using the transfer approach. Unlike most other F?r projects, the Saarbruecken group was left relatively free to pursue research interests, rather than forced to produce applications, and was also funded at a level sufficient to permit significant on-going experimentation and modification. As a result, SUSY tended to track external developments in ~ and AI more closely than other projects. For example, Saarbruecken helped establish the co-operative HT group LEIBNIZ (along with Grenoble and others) in 1974, and adopted design ideas from the GETA system. Until 1975, SUSY was based on a strict transfer approach; since 1976, however, it has evolved, becoming more abstract as linguistic problems mandating &quot;deeper&quot; analysis have forced the transfer representations to assume some of the generality of an interlingua. Also as a result of such research freedom, there was apparently no sustained attempt to develop coverage for specific applications.</Paragraph> <Paragraph position="1"> Intended as a multi-lingual system involving English, French, German and Russian, work on SUSY has concentrated on translation into German from Russian and, recently, English. Thus, the extent to which SUSY may be capable of multilingual translation has not yet been ascertained. Then, toO, some aspects of the software are surprisingly primitzve: only very recently, for example, did the morphological analysis program become nondeterministic (i.e., general enough to permit lexical ambiguity). The strongest limiting factor in the further development of SUSY seems to be related to the initial inspiration behind the project: SUSY adopted a primitive approach in which the linguistic rules were organized into independent strata, and were incorporated directly into the software \[Maas, 84\]. As a consequence, the rules were virtually unreadable, and their interactions, eventually, became almost impossible to manage. In terms of application potential, therefore, SUSY seems to have failed. A second-generation project, SUSY-II, begun in 1981, may fare better.</Paragraph> <Paragraph position="2"> EUROTRA EUROTRA is the largest MT project in the Western world. It is the first serious attempt to produce a true multi-lingual system, in this case intended for all seven European Economic Community languages. The justification for the project is simple, inescapable economics: over a third of the entire administrative budget of the EEC for 1982 was needed to pay the translation division (average individual income: $43,O00/year), which still could not keep up with the demands placed on it; technical translation costs the EEC $.20 per word for each of six translations (from the seventh original language), and doubles the cost of the technology documented; with the addition of Spain and Portugal later this decade, the translation staff would have to double for the current demand level (unless highly productive machine aids were already in place) \[Perusse, 83\]. The high cost of writing SYSTRAR dictionary entries is presently justifiable for reasons of speed in translation, but this situation is not viable in the long term.</Paragraph> <Paragraph position="3"> The EEC must have superior quallty MT at lower cost for dictionary work. Human translation alone will never suffice.</Paragraph> <Paragraph position="4"> EUROTRA is a true multi-national development project. There is no central laboratory where the work will take place, but instead designated University representatives of each member country will produce the analysis and synthesis modules for their native language; only the transfer modules will be built by a &quot;central&quot; group -- and the transfer modules are designed to be as small as possible, consisting of little more than lexical substitution \[King, 82\]. Software development will be almost entirely separated from the linguistic rule development; indeed, the production software, though designed by the EUROTRAmembers, will be written by whichever commercial software house wins the contract in bidding competition. Several co-ordinating c~ittees are working with the various language and emphasis groups to insure co-operation.</Paragraph> <Paragraph position="5"> The linguistic basis of EUROTRA is nothing novel.</Paragraph> <Paragraph position="6"> The basic structures for representating &quot;meaning&quot; are dependency trees, marked with feature-value pairs partly at the discretion of the language groups writing the gram~nars (anything a group wants, it can add), and partly controlled by mutual agreement among the language groups (a certain set of feature-value combinations has been agreed to constitute minimum information; all are constrained to produce this set when analyzing sentences in their language, and all may expect it to be present when synthesizing sentences in their language) \[King, 81, 82\].</Paragraph> <Paragraph position="7"> The software basis of EUROTRA will not be novel either, though the design is not yet complete. The basic rule interpreter will be &quot;a general re-write system with a control language over grazamars/processes&quot; \[King, personal communication\]. As in GETA, the linguistic rules will be bundled into packets of subgrammars, and the linguists will be provided with a means of controlling which packets of rules are applied, and when; the individual rules will be non-destructive re-write rules, so that the application of any given rule may create new structure, but will never erase any old information (no back-up).</Paragraph> <Paragraph position="8"> EUROTRAwill engage in straightforward development using state-of-the-art but &quot;proven&quot; techniques. The charter requires delivery of a small representative prototype system by late 1987, and a prototype covering one technical area by late 1988.</Paragraph> <Paragraph position="9"> EUROTRA is required to translate among the native languages of all member countries which sign the &quot;contract of association&quot; by early mid-84; thus, not all seven EEC languages will necessarily be represented, but by law at least four languages must be represented if the project is to continue.</Paragraph> <Paragraph position="10"> The State of the Art Human languages are, by nature, different. So much so, that the illusory goal of abstract perfection in translation -- once and still imagined by some to be achievable -- can be comfortably ruled out of the realm of possible existence, whether attempted by machine or man. Even the abstract notion of &quot;quality&quot; is undefinable, hence immeasurable. In its place, we must substitute the notion of evaluation of translation according to its purpose, judged by the consomer. One must therefore accept the truth that the notion of quality is inherently subjective. Certainly there will be translations hailed by most if not all as &quot;good,&quot; and correspondingly there will be translations almost universally labelled 'bad.&quot; Most translations, however, will surely fall in between these extremes, and each user must render his own judgement according to his needs.</Paragraph> <Paragraph position="11"> In corporate circles, however, there is and has always been an operational definition of &quot;good&quot; vs. 'bad&quot; translation: a good translation is what senior translators are willing to expose to outside scrutiny (not that they are fully satisfied, for they never are); and a bad one is what they are not willing to release. These experienced translators -- usuatly post-editors -- impose a judgement which the corporate body is willing to accept at face value: after all, such judgement is the very purpose for having senior translators. It is arrived at subjectively, based on the purpose for which the translation is intended, but comes as close to being an objective assessment as the world is likely to see. In a post-edltin@ context, a &quot;good&quot; original translation is one worth revising i.e., one which the editor will endeavor to change, rather than reject or replace with his own original translation.</Paragraph> <Paragraph position="12"> Therefore, any rational position on the state of the art in MT & MAT must respect the operational decisions about the quality of MT & MAT as judged by the present users. These systems are all, of course, based on old technology (&quot;ancient,&quot; by the standards of AI researchers); but by the time systems employing today's AI technology hit the market, they too will be &quot;antiquated&quot; by the research laboratory standards of their time. Such is the nature of technology. We will therefore distinguish, in our assessment, between what is available and/or used now (&quot;old,&quot; yet operationally current, technology), and what is around the next corner (techniques working in research labs today), and what is farther down the road (experimental approaches).</Paragraph> </Section> <Section position="2" start_page="557" end_page="557" type="sub_section"> <SectionTitle> Productlon Systems </SectionTitle> <Paragraph position="0"> Production M(A)T systems are based on old technology; some, for example, still (or until very recently did) employ punch-cards and print(ed) out translations in all upper-case. Few if any attempt a comprehensive &quot;global&quot; analysis at the sentence level (trade secrets make this hard to discern), and none go beyond that to the paragraph level.</Paragraph> <Paragraph position="1"> None use a significant amount of semantic information (though all claim to use some). Most if not all perform as &quot;idiots savants', making use of enormous amounts of very unsophisticated pragmatic information and brute-force computation to determine the proper word-for-word or idiom-for-idiom translation followed by local rearrangement of word order -- leaving the translation chaotic, even if understandable.</Paragraph> <Paragraph position="2"> But they work! Some of them do, anyway -- well enough that their customers find reason to invest enormous amounts of time and capital developing the necessary massive dictionaries specialized to their applications. Translation time is certainly reduced. Translator frustration is increased or decreased, as the case may be (it seems that personality differences, among other things, have a large bearing on this). Some translators resist their introduction -- there are those who still resist the introduction of typewriters, to say nothing of word processors -- with varying degrees of success. But most are thinking about accepting the place of computers in translation, and a few actually look forward to relief from much of the drudgery they now face. Current MT systems seem to take some getting used to, and further productivity increases are realized as time goes by; they are usually accepted, eventually, as a boon to the bored translator. New products embodying old technology are constantly introduced; most are found not viable, and quickly disappear from the market. But those which have been around for years must be economically justifiable to their users -else, presumably, they would no longer exist.</Paragraph> </Section> <Section position="3" start_page="557" end_page="558" type="sub_section"> <SectionTitle> Development Systems </SectionTitle> <Paragraph position="0"> Systoms being developed for near-term introduction employ Computational Linguistics (CL) techniques cf the late 1970&quot;s, if not the 80&quot;s. Essentially all are full HT, not MAT, systems. As Hutchins \[82\] notes, &quot;...there is now considerable agreement on the basic strategy, i.e. a &quot;transfer&quot; system with some semantic analysis and some interlingual features in order to simplify transfer components.&quot; These systems employ one of a variety of sophisticated parsing/transducing techniques, typically based on charts, whether the grammar is expressed via phrase-structure rules (e.g., METAL) or \[strings of\] trees (e.g., GETA, EUROTRA); they operate at the sentence level, or higher, and make significant use of semantic features. Proper linguistic theories, whether elegant or not quite, and heuristic software strategies take the place of simple word substitution and brute-force programming. If the analysis attempt succeeds, the translation stands a fair chance of being acceptable to the revisor; if analysis fails, then fail-soft measures are likely to produce something equivalent to the output of a current production MT system.</Paragraph> <Paragraph position="1"> These systems work well enough in experimental settings to give their sponsors and waiting customers (to say nothing of their implementors) reason to hope for near-term success in application. Their technology is based on some of the latest techniques which appear to be workable in i,m, ediate large-scale application. Most &quot;pure AI&quot; techniques do not fall in this category; thus, serious AI researchers look down on these development systems (to say nothing of production systems) as old, uninteresting -- and probably useless. Some likely are. But others, though &quot;old,&quot; will soon find an application niche, and will begin displacing any of the current production systems which try to compete. (Since the present crop of development systems all seen to be aimed at the &quot;information dissemination&quot; application, the current productlon systems that are aimed at the &quot;information acquisition&quot; market may survive for some time.) The major hurdle is time: time to write and debug the grammars (a very hard task), and time to develop lexicons with roughly ten thousand general vocabulary items, and the few tens of thousands of technical terms required per subject area. Some development projects have invested the necessary time, and stand ready to deliver commercial applications (e.g., GETA, METAL).</Paragraph> </Section> <Section position="4" start_page="558" end_page="558" type="sub_section"> <SectionTitle> Research Systems </SectionTitle> <Paragraph position="0"> The biggest problem associated with MT research systems is their scarcity (nonexistence, in the U.S.). If current CL and AI researchers were seriously interested in multiple languages -- even if not for translation per se -- this would not necessarily be a bad situation. But in the U.S.</Paragraph> <Paragraph position="1"> they certainly are not, and in Europe, CL and AI research has not yet reached the level achieved in the U.S. Western business and industry are naturally more concerned with near-term payoff, and some track development systems; very few support FiT development directly, and none yet support pure D~ research at a significant level. (The Dutch firm Philips may, indeed, have the only long-term research project in the West.) Some European governments fund significant R&D projects (e.g., Germany and France), but Japan is making by far the world's largest investment in MT research. The U.S. government, which otherwise supports the best overall AI and \[English\] CL research in the world, is not involved.</Paragraph> <Paragraph position="2"> Where pure MT research projects do exist, they tend to concentrate on the problems of deep meaning representations -- striving to pursue the goal of a true AI system, which would presumably include language-independent meaning representations of great depth and complexity. Translation here is seen as just one application of such a system: the system &quot;understands&quot; natural language input, then &quot;generates&quot; natural language output; if the languages happen to be different, then translation has been performed via paraphrase. Translation could thus be viewed as one of the ultimate tests of an Artificial Intelligence: if a system &quot;translates correctly,&quot; then to some extent it can be argued to have &quot;understood correctly,&quot; and in any case will tell us much about what translation is all about. In this role, MT research holds out its greatest promise as a once-again scientifically respectable discipline. The first requirement, however, is the existence of research groups interested in, and funded for, the study of multiple languages and translation among them within the framework of AI research. At the present time only Japan, and to a somewhat lesser extent western Europe, can boast such groups.</Paragraph> </Section> <Section position="5" start_page="558" end_page="559" type="sub_section"> <SectionTitle> Future Prospects </SectionTitle> <Paragraph position="0"> The world has changed in the two decades since ALPAC. The need and demand for technical translation has increased dramatically, and the supply of qualified human technical translators has not kept pace. (Indeed, it is debatable whether there existed a sufficient supply of qualified technical translators even in 1966, contrary to ALPAC's claims.) The classic &quot;law of supply and demand&quot; has not worked in this instance, for whatever reasons: the shortage is real, all over the world; nothing is yet serving to stem this worsening situation; and nothing seems capable of doing so outside of dramatic productivity increases via computer automation. In the EEC, for example, the already overwhelming load of technical translation is projected to rise sixfold within five years.</Paragraph> <Paragraph position="1"> The future premises greater acceptance by translators of the role of machine aids -- running the gamut from word processing systems and on-line term banks to MT systems -- in technical translation. Correspondingly, M(A)T systems will experience greater success in the marketplace. As these systems continue to drive down the cost of translation, the demand and capacity for translation will grow even more than it would otherwise: many &quot;new&quot; needs for translation, not presently economically justifiable, will surface.</Paragraph> <Paragraph position="2"> If MT systems are to continue to improve so as to further reduce the burden on human translators, there will be a greater need and demand for continuing MT R&D efforts.</Paragraph> <Paragraph position="3"> Conclusions The translation problem will not go away, and human solutions (short of full automation) do not now, and never will, suffice. MT systems have already scored successes among the user community, and the trend can hardly fail to continue as users demand further improvements and greater speed, and MT system vendors respond. Of course, the need for research is great, but some current and future applications will continue to succeed on economic grounds alone -- and to the user community, this is virtually the only measure of success or failure.</Paragraph> <Paragraph position="4"> It is important to note that translation systems are not going to &quot;fall out&quot; of AI efforts which are not seriously contending with multiple languages from the start. There are two reasons for this.</Paragraph> <Paragraph position="5"> First, English is not a representative language.</Paragraph> <Paragraph position="6"> Relatively speaking, it is not even a very hard language from the standpoint of Computational Linguistics: Japanese, Chinese, Russian, and even German, for example, seem more difficult to deal with using existing CL techniques -- surely in pert due to the nearly total concentration of CL workers on English. Developing translation ability will require similar concentration by CL workers on other languages; nothing less will suffice.</Paragraph> <Paragraph position="7"> Second, it would seem that translation is not by any means a simple matter of understanding the source text, then reproducing it in the target language -- even though some translators (and virtually every layman) will say this is so. On the one hand, there is the serious question of whether, in for example the case of an article on front-line research in semiconductor switching theory, or nuclear physics, a translator really does &quot;fully comprehend&quot; the content of the article he is translating. One would suspect not.</Paragraph> <Paragraph position="8"> (Johnson \[83\] makes a point of claiming that he has produced translations, judged good by informed peers, in technical areas where his expertise is deficient, and his understanding, incomplete.) On the other hand, it is also true that translation schools expend a great deal of effort teaching techniques for low-level lexical and syntactic manipulation -- a curious fact to contrast with the usual &quot;full comprehension&quot; claim. In any event, every qualified translator will agree that there is much more to translation than simple analysis/synthesis (an almost prima facie proof of the necessity for Transfer).</Paragraph> <Paragraph position="9"> What this means is that the development of translation as an application of Computational Linguistics will require substantial research in its own right in addition to the work necessary in order to provide the basic multi-lingual analysis and synthesis tools. Translators must be consulted, for they are the experts in translation. None of this will happen by accident; it must result from design.</Paragraph> </Section> </Section> class="xml-element"></Paper>