File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2154_metho.xml
Size: 9,270 bytes
Last Modified: 2025-10-06 14:12:15
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-2154"> <Title>DLT - AN INDUSTRIAL R & D PROJECT FOR MULTILINGUAL MT</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> DLT - AN INDUSTRIAL R & D PROJECT FOR MULTILINGUAL MT Toon WITKAM </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> An overview of the DLT (Distributed Language Translation) project is given. This project is aimed at a new, multilingual MT system in the 1990s, which uses Esperanto as an internal interlingua. The system's ,architectural features, current progress and project organization are dealt with.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> DLT (Distributed Language Translation). is the name of a principle, a design philosophy and a project. Within the area of MT, it represents another approach for steering between the hazards of low-quality output, endless prolongation of research and development time, restriction to narrowlybounded subject fields, the geometric cost expansion when a new language is added, etc.</Paragraph> <Paragraph position="1"> DLT is a concentrated high-tech effort to attain a product line of language translation modules in the 1990s.</Paragraph> <Paragraph position="2"> Together, these modules will constitute an interactive, knowledge-based, multilingual translation system, perfecdy suited for operation on networked desk-top equipment.</Paragraph> <Paragraph position="3"> DLT was conceived in 1979, in an environment with no historical ties to MT whatsoever. After patents had been applied for in 14 countries, the first publication followed at the conference on &quot;New Systems and Services in Telecommunications&quot; in Liege \[ 1980\].</Paragraph> <Paragraph position="4"> In 1982, the EEC granted a quarter of a million guilders for a DLT Feasibility Study, which was completed in 1983. A remarkable feature of the DLT design, highlighted in this study, was the use of Esperanto as intermediate language, with its own lexicon. This meant the adoption of an overall interlingual architecture, the most ambitious structure known for an MT system.</Paragraph> <Paragraph position="5"> At the same time, the introduction of Esperanto into the MT scene of the 1980s aroused a lot of skepticism and prejudice. As it happens, this semi-artificial language (invented by an ophthalmo-logist towards the end of the nineteenth century) is not usually considered a respectable object of study among professional linguists.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Design philosophy </SectionTitle> <Paragraph position="0"> The research team at BSO considers Esperanto a valuable tool in language technology, and has motivated its use as the DLT pivot on rigorous systems engineering grounds: - an overall interlingual architecture, i.e. an MT process of 2 main steps (instead of 3) fits extremely well into the outside operating environment, which consists of 'senders' and 'receivers' linked by a communications network; the interlingua (or Intermediate Language) is the 'semi-product' passed over the network, and should be independent of any source or target language in the system; - the knowledge-based component of the translation process, the world-knowledge inferencing system for resolving ambiguities is essentially language-independent and can therefore entirely be built in the interlingua; serving a multilingual system, this is an important economy-of-scale consideration; - long-term development and maintenance of a complex translation and world knowledge system is a task that can only succeed with perfect man-machine interfaces for the system engineers; linguists, lexicographers, terminologists and other specialists must be offered quick and easy access to the heart of the translation machinery; this calls for an interlingua that is directlY,legible; at the same time, the interlingua should be lexicologically autonomous and well-defined, the former eliminating the need for re-paraphrasing in other languages, the latter being a prerequisite for distributed system development (language teams working to and from one common interlingua); Esperanto meets these requirements.</Paragraph> </Section> <Section position="5" start_page="0" end_page="756" type="metho"> <SectionTitle> 3. Prototype construction </SectionTitle> <Paragraph position="0"> In 1984, BSO set up a plan for a 6-year research and development project (75 person-years at the cost of 18 million guilders), aimed ata DLT prototype capable of translating at least one language pair (English-French). This plan received the su0port of the Ministry of Economic Affairs of the Netherlands, which granted an innovation subsidy of 8 million guilders. The first half of this 6-year schedule has now been completed.</Paragraph> <Paragraph position="1"> A first prototype of DLT was shown to the press in December 1987. Though operating only slowly as yet, with a small vocabulary (2000 English words) and a restricted grammar, this laboratory model shows the various monolingual and bilingual processing steps of DLT in proper sequence \[see also Fig. 1\]: 1. Exhaustive parsing of the English source text. Two different parser implementations have been realized in the search for the fastest formalism: one is based on ATNs and BSO's graphic software environment (on SUN 3/50 workstations) developed for setting up, testing and optimizing ATNs, the other is based on APSG and the PARSPAT software system from the University of Amsterdam \[Van der Steen, 1987\].</Paragraph> <Paragraph position="2"> The parsing process in DLT is breadth-first, syntaxonly, and delivers dependency (not constituency) trees. 2. Surface translation (first hail). Contrastive syntactic rules between English and Esperanto are applied here. This system of bilingual rules (250 at present) is based upon dependency grammar formalizations of both languages.</Paragraph> <Paragraph position="3"> The methodo-logical framework has been inspired by the work of the French linguist Tesniere and is comprehensively described in \[Schubert, 1987\]. Semantic considerations are disregarded systematically at this stage. The result is a (sometimes large) number of 'formally possible' parallel translations.</Paragraph> <Paragraph position="4"> 3. Main semantic analysis, entirely carried out in the Intermediate Language, by searching through a knowledge base of some 75.000 (present status) semantically related Esperanto word pairs, and by applying text-grammatical principles of cohesion etc. to the intermediate stage of the t~rauslated text \[Papegaaij, 1986 and 1988\].</Paragraph> <Paragraph position="5"> This automatic disambiguation system, written in Quintus PROLOG, now largely serves as a rating (preordering) of parallel surface translations, prior to the disambiguation dialogue which follows it. The DLT design offers a long. term perspective for steady improvement of this wobabitistic component, ultimately by machine learuing.</Paragraph> <Paragraph position="6"> 4. Disambigu_ation dialogue. The user is prompted to make a choice out of the possible interpretations listed on the screen. Note that these are parallel surface translations, backotranslated ('paraphrased') into the source language. For the user~ the disambiguation dialogue is a strictly monolingual affair, and free of linguistic jargon. In the present realization of the DLT prototype, mainly lexical ambiguities can be displayed.</Paragraph> <Paragraph position="7"> 5. Surface tr~nslation (..second half). As Step 2 above, but now between tile Intermediate Language and French. Some 500 contrastive syntactic rules have been implemented so far. Though the proliferation of parallel translations is less at this side of the translation process (due to the syntactic unambiguity of Esperanto and its lack of homonyms), it is not absent. If the target lauguage happens to have a more refined &quot;cutting..up- of-reality&quot; in some concept area (like the proverbial 10 words for 'snow' in Eskimo), parallel translations will result. All the results of this step are in the form of dependency trees.</Paragraph> <Paragraph position="8"> 6. Additiomfl semantics. TL-specific selection criteria are applied to select the right word. But because these criteria are knowledge-based (we are not talking of idiomatic phenomena), they are restated in terms of the IL, and the selection process is carried out on the intermediate stage of the translated text, using the Esperanto knowledge bank again, if the context does not provide enough clues, a default choic~ (e.g. the least specific word for 'snow') will be made. In contrast to the source language half of the system, there is no possibility for human intervention here. 7. ~sij. of the target sentence. In this tree-to-string conversion, the TL-specific word order is determined (including the applicatkm of elision and contraction rules).</Paragraph> </Section> class="xml-element"></Paper>