File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/j89-1003_intro.xml
Size: 10,233 bytes
Last Modified: 2025-10-06 14:04:48
<?xml version="1.0" standalone="yes"?> <Paper uid="J89-1003"> <Title>DESIGN OF LMT: A PROLOG-BASED MACHINE TRANSLATION SYSTEM</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 INTRODUCTION </SectionTitle> <Paragraph position="0"> The purpose of this paper is to describe an experimental English-to-German machine translation system, LMT (logic-based machine translation), 2 which has evolved out of previous work by the author on logic grammars.</Paragraph> <Paragraph position="1"> The translation system is organized in a modular way. The grammar for analysis of the source language (English) is written completely independently of the task of translation. In fact, this grammar produces logical forms that can be used for other applications such as database query systems and knowledge-based systems, and has been used in the systems described in McCord (1982, 1987), Teeple (1985), Bernth (1988), and Dahlgren (1988). The components of LMT dealing specifically with translation do not index into the grammar rules, as, for example, in the LRC system (Bennett and Slocum 1985).</Paragraph> <Paragraph position="2"> An interesting sort of modularity exists in the English grammar itself, whereby syntax, lexicon, and semantic interpretation closely interact, yet manage to be clearly separated. The lexicon exerts control over syntactic analysis through the use of slot frames in lexical entries and slot filling methods in syntax, as well as through type checking with semantic types taken from lexical entries. Yet the syntax rules look completely syntactic; e.g., no specific semantic types or word senses are referred to. The syntactic analysis trees look like surface structure trees, with annotations showing grammatical relations (including remote relations due to extraposition). The terminal nodes of these trees are logical terminals (explained below), which contain word sense predications and can be used in building logical forms as semantic representations of sentences. These logical forms are built by a separate semantic interpretation component which deals with problems of scoping of quantifiers and other modifiers.</Paragraph> <Paragraph position="3"> Given that the English grammar can produce both syntactic structures and logical forms, an issue in designing LMT was what structures to use as input to transfer. The initial idea was to use the logical forms.</Paragraph> <Paragraph position="4"> The main argument for this was that 1. the logical form analyses express the complete meaning of the source text, and 2. there is no doubt that for perfect translations, one must in general have a complete semantic analysis of the source text (and employ world knowledge to get it). The logical form analyses are expres-Copyright 1989 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 0362-613X/89/010033-52503.00 Computational Linguistics, Volume 15, Number 1, March 1989 33 Michael C. McCord \]Design of LMT: A Prolog-Based Machine Translation System sions in a logical form language (LFL) (McCord 1985a, 1987). Although the formalism for LFL is intended to be language universal, there is actually a ,different version of LFL for every natural language, because most of the predicates are word senses in the natural language being analyzed. The original scheme, then, :for LMT was to analyze English text into English LFL forms, then transfer these to German LFL form.,;, then generate German text.</Paragraph> <Paragraph position="5"> This scheme is neat, and may be investigated again later; but for the sake of practicality, the compromise has been to use the syntactic analyses produced by the grammar as the point of transfer. Useful MT systems must generally work with rather large domains, and the trouble with the use of logical forms is that too many decisions must be made and too much world knowledge is needed to produce correct analyses for a large domain. For example, LFL expressions for degree adjectives like &quot;good&quot; are focalizers (McCord 1985a, 1987), where the base argument shows the base of comparison for the adjective. In general, it may be difficult to determine such arguments. In the syntactic structure, arguments of focalizers are not yet determined; but for the purposes of translation, such scoping problems can often (though not always) be ignored. They can often be sidestepped because the same ambiguity exists in the target language. For example, &quot;He is good&quot; can easily translate into Er ist gut without deciding &quot;good with respect to what?&quot;. Another point is that in the case of languages as close as English and German, it is simply more direct to transfer syntactic structure to syntactic structure. For more discussion of the practicality of a syntactic transfer method, see Bennett and Slocum (1985).</Paragraph> <Paragraph position="6"> It should be emphasized that the syntactic analysis trees produced by the grammar do contain some of the ingredients of semantic interpretation. As mentioned above, terminal nodes contain word sense predications.</Paragraph> <Paragraph position="7"> Although the arguments of focalizer predications are not yet filled in, the arguments of verb and noun senses (corresponding to complements), are filled in (inasmuch as they can be determined by the syntax of the sentence, plus a few heuristics). Semantic type checking involves Prolog inference and is used for constraining word sense selection, complementation, and adjunct attachment. Also certain preference heuristics, described in Section 2 below, are used for modifier attachment.</Paragraph> <Paragraph position="8"> Translation of a sentence by LMT proceeds in five steps.</Paragraph> <Paragraph position="9"> 1. Lexical preprocessing; 2. English syntactic analysis; 3. English-to-German transfer; 4. German syntactic generation; 5. German morphological generation.</Paragraph> <Paragraph position="10"> During Step 1, lexical preprocessing, the words of an input sentence are looked up in the LMT lexicon, in combination with English morphological analysis (both inflectional and derivational). Morphological derivations are used to synthesize new transfer entries. For example, the derivation of &quot;reuseable&quot; from &quot;use&quot; and the existence of a transfer entry use--> verwenden allow automatic synthesis of a new transfer entry reuseable wieder verwendbar.</Paragraph> <Paragraph position="11"> Step 1, and Step 5 as well, are the topics of a companion paper (McCord and Wolff 1988). The present paper deals mainly with the syntactic components of LMT; but enough description of the lexicon is given to make the discussion self-contained.</Paragraph> <Paragraph position="12"> Step 2, F, nglish syntactic analysis, is dealt with in Section 2. Several aspects of the English grammar are described: the Modular Logic Grammar formalism, use of metarules in the grammar, special syntactic techniques, and the methods used for semantic type checking. null Section 3 provides an overview of the LMT lexicon and its relation to the English grammar.</Paragraph> <Paragraph position="13"> Step 3 is dealt with in Section 4, &quot;The Transfer Component of LMT.&quot; The transfer component converts an English syntax tree into the German transfer tree. This is a syntax tree that (normally) has the same shape as the English tree, but has different node labels. Its nonterminal nodes are labeled by feature structures appropriate for German syntax and morphology; and its terminal nodes are (normally) citation forms of German words, together with feature structures that determine the inflections of the words during Step 5.</Paragraph> <Paragraph position="14"> The transfer algorithm works in a simple way, in one top-down, left-to-right pass, yet manages to get a lot done, making German word choices and essentially producing all required German feature structures (like case markers). This is facilitated by use of logical variables and unification. Lexical transfer information resides in Prolog clauses (in internal representation), used by the transfer algorithm for simultaneous determination of German target words and associated inflectional markings for complements of these target words. Step 4, German syntactic generation, is described in Section 5. This phase takes the German transfer tree and produces a German surface structure tree by applying a battery of tree transformations in a cycle, as in transformational grammar. The pattern matching used by these transformations is mainly Prolog unification, but there is an augmentation for matching sublists.</Paragraph> <Paragraph position="15"> Transformations are expressed in a special notation involving this augmented pattern matching and are compiled by the system into normal Prolog clauses. The number of transformations used in the system is rather small (currently 44), because the general idea of LMT is to get as much right as possible during the transfer step. As mentioned above, Step 5, German morphological generation, is described in detail in McCord and Wolff (1988), but some comments are given here in Section 6.</Paragraph> <Paragraph position="16"> Section 7 briefly describes the status of the system as of November 1988. It is worth noting here that LMT, 34 ComputaLtional Linguistics, Volume 15, Number 1, March 1989 Michael C. McCord Design of LMT: A Prolog-Based Machine Translation System although fairly large by now, is written entirely in Prolog (except for a few lines of trivial system code). No need has been seen for other methods, even for quick access to large dictionary disk files. The version of Prolog used is VM/Prolog (written by Marc Gillet), running on an IBM mainframe. The features of Prolog (especially logical variables and unification) have been very useful in making LMT easy to write. 3</Paragraph> </Section> class="xml-element"></Paper>