XML Viewer - p80-1044

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/80/p80-1044_metho.xml
Size: 11,162 bytes
Last Modified: 2025-10-06 14:11:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="P80-1044">
  <Title>An Experiment in Machine Translation</Title>
  <Section position="2" start_page="0" end_page="163" type="metho">
    <SectionTitle>
EARLIER MT EFFORTS
</SectionTitle>
    <Paragraph position="0"> Since Bruderer \[2\] has recently published a complete survey of MT projects, and Hutchins \[3\] reviews the most important developments through 1977, we will mention only a few of the major efforts. The first popular demonstration of the possibilities in MT was provided by IBM and the Georgetown University group in 19S4 \[4\].</Paragraph>
    <Paragraph position="1"> With a vocabulary of about 250 words and a grammar comprising some six rules in what was called an &amp;quot;operational syntax&amp;quot;, the system demonstrated some rudimentary capability in Russian to English translation. This instlgated a massive government funding effort over the next decade, and some 20 million dollars was invested in 17 different projects. By 1965 the Mark II Russian-English system \[5\] had been installed at the Foreign Technology Division of the U.S. Air Force at Wright-Patterson AFB, and the Georgetown system had been delivered to the Atomic Energy Commission at Oak Ridge Natlonal Laboratory and to EURATOM in Ispra, Italy. Reviewing MT systems such as these at the request of the National Science Foundation, the Automatic Language Processing Advisory Committee (ALPAC) reported in 1966 that MT was slower, less accurate, and more expensive than human translation; further, that there was no predlctable prospect of improvement in MT capability. Though strongly and perhaps justifiably criticized \[6\], this report soon resulted in the virtual elimination of MT funding in the U.S., and a sizeable reduction in fo~ign efforts as well.</Paragraph>
    <Section position="1" start_page="0" end_page="163" type="sub_section">
      <SectionTitle>
Jonathan Slocum
I.inguistics Research Center
The University of Texas
</SectionTitle>
      <Paragraph position="0"> Peter Toma, who was responsible for the installations at Oak Ridge and Ispra cited above, soon began private efforts at improving the Georgetown system. This culminated in SYSTRAN \[7\], which replaced Mark II at WPAFB in 1970 and the Georgetown system at EURATOM in 1976.</Paragraph>
      <Paragraph position="1"> SYSTRAN was also used by NASA during the Apollo-Soyuz mission. In 1976 the Commission of European Communities adopted SYSTRAN for English to French translation; however, an evaluation of its translations by the EEC post-editors in Brussels found the results to be far from satisfactory: &amp;quot;all the revisors had exhausted their patience before the end&amp;quot; \[8\]. Despite its generally low translation quality, SYSTRAN is the most widely used MT system to date. its chief commercial competitor, LOGOS \[9\], is another example of a &amp;quot;direct&amp;quot; MT system. As in SYSTRAN, the analysis and synthesis components are separated but the linguistic procedures are designed for a specific source-language (SL) and target-language (TL) pair. In an evaluation by Slnaiko and Klare \[10\], LOGOS dld not fare well. 8ruderer \[2\] reports further development for translation into Russian, and experiments on French, German and Spanish, but provides few details.</Paragraph>
      <Paragraph position="2"> In an effort to correct the obvious inadequacies of these and other 'first generation' systems, which essentialiy translate word-for-word with no attempt at a unified analysis at the sentence level, and which were developed ab initio for a specific SL-TL pair, researchers began to investigate methods of analyzing sentences into structures from which in theory any TL could be generated. There are two broad types of such 'second generation' systems. One type produces analyses in a &amp;quot;neutral&amp;quot; structure, or 'interlingua~; the other produces SL syntactic structures which are transformed via a process called 'transfer' into a syntactic structure for the TL sentence. One example of the former approach is the system produced by the Centre d'~tudes pour la Traductlon Automatique (CETA) at the University of Grenoble \[11\]. During the period from 1961 to 1971 this group developed a Russian to French MT system. An evaluation at the end of that period revealed that only 42~ of the sentences were being correctly translated. Some failures were due to errors in the input, but the majority were due to programming errors, failure to produce a lexical analysis of a word or a syntactic analysis of a sentence, inefficiencies in the parser causing it to apply too many rules, etc. The Traduction Automatique de l'Universit~ de MontrEal (TAUM) project \[12\] is an example of the transfer approach. There are flve grammars called &amp;quot;q-systems&amp;quot; to effect morphological and syntactic analysis of English, then transfer, then syntactic and morphological synthesis of French. Each such stage consists of a series of generalized tree-structure transfoP mations. The significance of TAUM is that, of the second-generation systems, it is the nearest to operational implementation: it is to be applied to the translation of aircraft maintenance manuals.</Paragraph>
      <Paragraph position="3"> in 1978 the European project EUROTRA was initiated, apparently adopting the newer Grenoble system ARIANE, in order to produce an advanced, second generation MT system for the eventual replacement of the first generation system (SYSTRAN) currently in use \[8\]. The Grenoble group, now tit\]ed Groupe d'Etudes pour la Traduction Automatlque (GETA), abando'ed their earlier approach in light of its deficiencies and produced a system to translate in six passes: morphological analysis, multi-level (syntactic and semantic) analysis, lexical transfer, structural transfer, syntactic generation, and morphological generation. Multi-level analysis, structural transfer, and syntactic generation are all effected ~.a a general tree-to-tree transducer program, some- null what less powerfu; but merhaps more efficient than the Q-systems transduce r in TAUM; the other components have Special programs suited to their function. The emphasis in this project is apparently twofold: increased efficiency and reliability through adoption of components with the minimum necessary power, and decreased sensitivity to fai)ure in individual stages through the expedient of insuring that every component has some output, even if such output is nothing more than the original input. If we have interpreted the VauQuois mimeo \[8\] properly, this must be ~elargest and most comprehensive MT project yet undertaken.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="163" end_page="164" type="metho">
    <SectionTitle>
DESCRIPTION OF METAL
</SectionTitle>
    <Paragraph position="0"> There are two different classifications of &amp;quot;generations&amp;quot; in MT systems. The first posits three generations (currently) according to the following criteria: (I) translation is word-for-word, with no significant syntactic analysis; (2) translation proceeds after obtaining a complete syntactic analysis of an input, with no significant semantic analysis; (3) translation proceeds after obtaining a complete semantic analysis of an input. The definition of 'third generation' says nothing about extra-sentential information, and one might posit a 'fourth generation' which employs such information. The other classification proceeds according to the following criteria: (l) translation proceeds &amp;quot;directly&amp;quot; from the SL to the TL, and the SL is analyzed only to the minimum extent necessary to generate TL equivalents; (2) translation proceeds &amp;quot;indirectly&amp;quot; by deriving a more-or-less standard analysis of the input, independent of the TL involved (but not necessarily of the SL), and then generating TL output based on the standard analysis. Within this definition of 'second generation', as noted above, there are the 'transfer' vs. 'interlingua' approaches.</Paragraph>
    <Paragraph position="1"> We prefer to characterize METAL as a 'third generation' system according to the first classification given above because this makes it clear that METAL derives a substantial semantic analysis, whereas the second definition of 'second generation' does not necessarily imply that semantic analysis of any kind is performed.</Paragraph>
    <Paragraph position="2"> METAL comprises two distinct components: the linguistic and the computational. The linguistic component consists of lexicons, phrase-structure grammar rules, case frames and transformations. SL and TL lexical entries include feature-value pairs encoding syntactic and semantic information such as grammatical category, inflectional class, semantic type, and case information (see Figure \]). Transfer lexical entries indicate how and under what conditions words or idioms in one language translate into words or idioms in another (see Figure 2). The phrase-structure rules may be augmented with procedures to determine their application via feature/ value tests, to add or copy features and values in the interpretation being constructed, to invoke case-frame routines, and to invoke specific or general transformations. Case-frame routines determine semantic case relationships between verbs and nouns on the basis of syntactic and semantic features, and produce their output in the form of propositional trees. Transformatio'- are pattern-pairs that specify old and new tree structures; when invoked, a transformation attempts to match its &amp;quot;old&amp;quot; side against the current structural descriptor, and if successful converts it into one matching its &amp;quot;new&amp;quot; side. In the process, features and values may be tested and set arbitrari}y. This provides the grammar.</Paragraph>
    <Paragraph position="3"> with virtually unlimite~ -ontext sensitivity, but since no interpretation can affect the operation of the parser it still enjoys the advantages of context-free operation. Finally, there is a method for scoring, or rating, interpretations; this allows the system to determine the &amp;quot;best&amp;quot; interpretation for translation, and also provides another mechanism for rejecting the application of any rule, viz, a score below cutoff. Figure 3 illustrates a typical grammar rule.</Paragraph>
    <Paragraph position="4">  The German PREPosition &amp;quot;in&amp;quot; (in parentheses) may translate into the English PREPosition &amp;quot;into&amp;quot; if the Grammatical Case of the German PP is 'Accusative'; it may translate into the English PREPosition &amp;quot;in&amp;quot; if the Grammatical Case of the German PP is 'Dative'. Arbitrary numbers and types of conditions may be specified in transfer entries.</Paragraph>
    <Paragraph position="5"> The computational component, written in LISP, consists of the parser, the case-frame routines, the transformation pattern-marcher, the transfer program, the generator, and other procedures needed to drive and support the translation process. The parser is a highly efficient implementation of the Cocke-Kasami-Younger algo-</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML