File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1100_metho.xml

Size: 16,714 bytes

Last Modified: 2025-10-06 14:11:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P84-1100">
  <Title>EXPERT SYSTEMS AND OTHER NEW TECHNIQUES IN MT SYSTEMS</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 - IMPORTANT CONCEPTS FROM EXISTING SYSTEMS
</SectionTitle>
    <Paragraph position="0"> For lack of space, we only list our major points, and refer the reader to (3,4,5,6,15) for further details.</Paragraph>
    <Paragraph position="1">  ! - Computer science aspects i) Use of Specialized Languages for Linguistic Programming (SLLP), like ATEF, ROBRA, Q-systems, REZO, etc.</Paragraph>
    <Paragraph position="2"> 2) Integration in some &amp;quot;user-friendly&amp;quot; envi null ronment, controlled by a conversational interface, and managing a specialized data-base composed of what we call &amp;quot;lln~-~are&amp;quot; (grammars, dictionaries, procedures, formats, variables~ and corpuses of texts (source, translated, revised, plus intermediate results and possibly &amp;quot;hors-textes&amp;quot; -- figures, etc.). 3) Analogy with compiler-compiler systems : rough translation is realized by a monolingual analysis, followed by a bilingual transfer, and  then by a monolingual generation (synthesis). 2 - Linguistic aspects I) Only linguistic levels (of morphology, syntax, logico-semantics, modality, actualisation, ...) are used, leading to some implicit understanding, characteristic of second-generation MT systems.</Paragraph>
    <Paragraph position="3"> 2) Hence, the extralinguistic levels (of expertise and pragmatics) which furnish some degree of explicit understanding are beyond the limits of second-generation CAT systems.</Paragraph>
    <Paragraph position="4"> 3) During analysis of a unit of translation,  computation of these (linguistic) levels is not done sequentially, but in a cooperative way. Analysis produces the analog of an &amp;quot;abstract tre@'~ namely a multilevel interface structure to represent all the computed levels on the same graph (a &amp;quot;decorated tree&amp;quot;).</Paragraph>
    <Paragraph position="5"> 4) Lexical knowledge is organized around the notion of lexical unit (LU), allowing for powerful paraphrasing capability.</Paragraph>
    <Paragraph position="6"> 5) The texts are segmented into translation units of one or more paragraphs. This allows for intersentential resolution of anaphora in some not too difficult cases.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="470" type="metho">
    <SectionTitle>
3 - AI aspects
</SectionTitle>
    <Paragraph position="0"> I) During the structural steps, the unit of translation is represented by the current &amp;quot;object tree&amp;quot;, which may encode several competing interpretations, like the &amp;quot;blackboard&amp;quot; of some AI systems. 2)This and the SLLPs' control structures allow for some heuristic programming : it is possible to explicitly describe and process ambiguous situations in the production rules.</Paragraph>
    <Paragraph position="1"> This is in contrast to systems based on combinatorial algorithms which construct each interpretation independently, even if they represent them in a factorized way.</Paragraph>
    <Paragraph position="3"> The experience gained by the development of a Russian-French translation unit of a realistic size over the last three years (6) has shown that maintaining and upgrading the lingware, even in an admittedly limited second generation CAT system, requires a good deal of expertise. Techniques are now being developed to maintain the linguistic knowledge base. Some of them deal with the lexical data-base, others with the definition and use of specification formalisms (&amp;quot;static grammars&amp;quot;) and verification tools.</Paragraph>
    <Paragraph position="4"> Lexical knowledge processin~ In the long run, dictionaries turn out to be the costliest components of CAT systems. Hence, we are working towards the reconciliation of &amp;quot;natural&amp;quot; and &amp;quot;coded&amp;quot; dictionaries, and towards the construction of automated verification and indexing tools. Natural dictionaries are usually accessed by lemmas (normal forms). Coded dictionaries of CAT systems, on the other hand, are accessed by morphs or by lexical units. Moreover, the information the two types of dictionaries contain is not the same.</Paragraph>
    <Paragraph position="5"> However, it is highly desirable to maintain some degree of coherency between the coded dictionaries of a CAT system and the natural dictionaries which constitute their source, for documentation purposes, and also because these computerized natural dictionaries should be made accessible to the revisors. Let us briefly present the kind of structure proposed by N. Nedobejkine and Ch. Boitet at an ATALA meeting in Paris in \]983. The central idea here is to start from the structure of modern dictionaries, which are accessed by the lemmas, but use the notion of lexical unit. Each item may be considered as a tree structure. Starting from the top, selections of a &amp;quot;local&amp;quot; nature (on the syntactico-semantic behavior in a phrase or in a sentence) give access to the &amp;quot;constructions&amp;quot;. Then, more &amp;quot;global&amp;quot; constraints lead to &amp;quot;word senses&amp;quot;. At each node, codes of one or more formalized models may be grafted on. Hence, it is in principle possible to index directly in this structure, and then to design programs to construct the coded dictionaries in the formats expected by the various SLLP. Up to this level, the information is monolingual and'usable for analysis as well as for generation. If the considered language is source in one or more language pairs, each word sense may be further refined, for each target language, and lead to equivalents expressed as constructions of the target language, with all other information contained in the dictionary constructed in a similar way for the target language. For lack of space, we cannot include examples.</Paragraph>
    <Paragraph position="6"> This part of the work thus aims at finding a good way of representing lexical knowledge But there is another problem, perhaps even more important. Because of the cost of building machine dictionaries, we need some way to transform and transport lexical knowledge from one CAT system to another. This is obviously a problem of translation.</Paragraph>
    <Paragraph position="7"> Hence, we consider this type of &amp;quot;integrated structure&amp;quot; as a possible lexical interface structure. Research has recently begun on the possibility of using classical or advanced data base systems to store this lexical knowledge and to implement the various tools required for addition and verification. VlSULEX and ATLAS (1) are first versions of such tools.</Paragraph>
    <Paragraph position="8"> Gran~atical knowledge processing Just as in current software engineering, we have long felt the need for some level of &amp;quot;static&amp;quot; (algebraic) specification of the functions to be realized by algorithms expressed in procedural programming languages. In the case of CAT systems, there is no a priori correct gran~,ar of the language, and natural language is inherently ambiguous. Hence, any usable specification must specify a relation (not a function) between strings and trees~ or trees and trees : many trees may correspond to one string, and, conversely, many strings may correspond to one tree.</Paragraph>
    <Paragraph position="9"> Working with B. Vauquois in this direction, S. Chappuy has developed a formalism of static ~rammars (7), presented in charts expressing the relation between strings of terminal elements (usually decorations expressing the result of some morphological analysis) and multilevel structural descriptors. This formalism is currently being used for all new linguistic developments at GETA.</Paragraph>
    <Paragraph position="10"> Of course, this is not a completely new idea. For example, M. Kay (|3) proposed the formalism of unification grammars for quite the same purpose.</Paragraph>
    <Paragraph position="11"> But his formalism is more algebraic and less geometric in nature, and we prefer to use a specification in terms of the kind of structures we are accustomed to manipulating.</Paragraph>
    <Paragraph position="12"> 2 - Grafting o n expert systems Seeing that linguistic expertise is already quite well represented and handled in current (&amp;quot;closed&amp;quot;) systems, we are orienting our research towards the possibility of addin~ extralinguistic knowledge (knowledge about some technical or scientific field, for instance) to existing CAT systems. Also, because current systems are based on transducers rather than on analyzers, it is perfectly possible that the result of analysis or of transfer (the &amp;quot;structural descriptors&amp;quot;) are partially incorrect and need correction. Knowledge about the types of errors made by linguistic systems may be called metalinsuistic.</Paragraph>
    <Paragraph position="13"> In his recent thesis (9), R. Gerber has attempted to design such a system, and to propose an initial implementation. The expertise to be incorporated in this system includes linguistic, metalinguistic, and extralinguistic knowledge. The system is constructed by combining a &amp;quot;closed&amp;quot; system, based only on linguistic knowledge (a lingware written in ARIANE-78), and two &amp;quot;open&amp;quot; systems, called &amp;quot;expert corrector systems&amp;quot;. The first is inserted at the junction between analysis and transfer, and the second between transfer and generation.</Paragraph>
    <Paragraph position="14">  The control structure of a corrector system is as follows :  (1) transform the result of analysis into a suitable form ; (2) while there is some error configuration do solve (using meta- or extralinguistie knowledge) ; if solving has failed then exit endif ; (4) perform a partial reconstruction of the structure, according to the solution found ; endwhile ; (5) output the final structure in ARIANE-78 format. (2) relies on metalinguistic knowledge only.</Paragraph>
    <Paragraph position="15">  The implementation has been done in FolI-PROLOG (8). The lingware used corresponds to a small English-French system developed for teaching purposes. Here are some examples.</Paragraph>
    <Paragraph position="16"> Example I : ADJ + N N  (1) Standard free-energy change is calculated by  this equation.</Paragraph>
    <Paragraph position="17"> The analyzer proposes that &amp;quot;standard&amp;quot;modifies &amp;quot;change&amp;quot;, while &amp;quot;free-energy&amp;quot; is juxtaposed to &amp;quot;change&amp;quot;, hence the erroneous translation : &amp;quot;La variable standard d'~nergie libre est calcul~e par cette formule&amp;quot;.</Paragraph>
    <Paragraph position="18"> In order to correct the structure, some knowledge of chemistry is required, namely that &amp;quot;standard free-energy change&amp;quot; is a ... standard notion. With this grouping, (1) translates as : &amp;quot;La variation d'finergie libre standard est calcul~e par cette formule&amp;quot;.</Paragraph>
    <Paragraph position="19"> Example 2 : (ADJ) N and N N (2) The mixture gives off dangerous cyanide and chlorine fumes.</Paragraph>
    <Paragraph position="20"> (2') The experiment requires carbon and nitrogen tetraoxyde.</Paragraph>
    <Paragraph position="21"> Let us develop this example a little more.</Paragraph>
    <Paragraph position="22"> Sentence (2) presents the problem of determining the scope of the coordination. The result of analysis (tree n deg 2) groups &amp;quot;dangerous cyanide&amp;quot; and chlorine fumes&amp;quot;, &amp;quot;chlorine&amp;quot; being juxtaposed to &amp;quot;fumes&amp;quot; (SF(JUXT) on node 12). Hence the translation : &amp;quot;La preparation d~gage le cyanure et la vapeur de chlore dangereux&amp;quot;.</Paragraph>
    <Paragraph position="23"> But, if we know that cyanide is dangerous as fumes, and not as crystals, we can correct the structure by grouping &amp;quot;(cyanide and chlorine) fumes&amp;quot; (see subtree n deg 2). The translation produced will then be : &amp;quot;La preparation d~gage la vapeur dangereuse de cyanure et de chlore&amp;quot;.</Paragraph>
    <Paragraph position="24"> Of course, some more sophisticated analyzers would (and some actually do) use the semantic marker &amp;quot;chemical element&amp;quot; present on both &amp;quot;chlorine&amp;quot; and &amp;quot;cyanide&amp;quot;, and then group them on the basis of the &amp;quot; semantlc density&amp;quot; (e.g., number of features shared). But this technique will fail on (2'), because there is no &amp;quot;carbon tetraoxyde&amp;quot; in normal chemistry ! Hence, without extralinguistic knowledge, this more sophisticated (linguistic) strategy will produce : &amp;quot;L'expfirience demande du t~traoxyde de carbone et  combines will the poisonous.</Paragraph>
    <Paragraph position="25"> The analyzer takes &amp;quot;beaker&amp;quot; instead of&amp;quot;water&amp;quot; as antecedent of &amp;quot;which&amp;quot;. The corrector may know that chlorine combines with water, and not with a beaker.</Paragraph>
    <Paragraph position="26"> Examples 4 &amp; 5 : Antecedent of &amp;quot;it&amp;quot; within or  beyond the same sentence (4) The state in which a substance is depends on the energy that it contains. When a substance is heated the energy of the substance is increased.</Paragraph>
    <Paragraph position="27"> (5) The particles vibrate more vigorously, and it becomes a liquid. (5') It melts.</Paragraph>
    <Paragraph position="28">  In order to choose between &amp;quot;substance&amp;quot; and &amp;quot;state&amp;quot; (4), one must make some type of complex reasoning using detailed knowledge of physics -and one may easily fail in a given context : it is not correct to simply state (as we did to solve this particular case), that a substance may possess energy, while a state cannot. Here, perhaps it is better to rely on some (metalinguistic) information on the typology, which may be included in a (specialized) linguistic analyzer, or in the expert corrector system. For (5), there are simple, but powerful rules like : if the antecedent cannot be found in the sentence, look for the nearest possible main clause subject to the left.</Paragraph>
    <Paragraph position="29"> 3 - Aiding the creation of the source documents Lingware engineering may be compared with modern software engineering, because it requires the design and implementation of complete programming systems, uses specification tools, and leads to research in automatic program generation. Starting from this analogy, a group of researchers at GETA have recently embarked on a project which could converge with still another line of software engineering, in a very interesting way. The final aim is to design and implement a syntactic~semantic structural metaeditor that uses a static grammar given as parameter in order to guide an author who is writing a document, in much the same manner as metaeditors like MENTOR are used for writing programs in classical programming languages.</Paragraph>
    <Paragraph position="30"> This could offer an attractive alternative to interactive CAT systems like ITS, which require a specialist to assist the system during the translation process. As a matter of fact, this principle i~ a sophisticated variant of the &amp;quot;controlled syntax&amp;quot; idea, like that implemented in the TITUS system. Its essential advantage is to guarantee the correctness of the intermediate structure, without the need for a large domain-specific knowledge base. It may be added that, in many cases, the documents being written are in effect contributing some new knowledge to the domain of discourse, which hen-c~ce cannot already be present in the computerized knowledge base, even if one exists.</Paragraph>
    <Paragraph position="31"> III - CONCLUSION : SOME LONG TERM PERSPECTIVES There are many areas open for future research The introduction of &amp;quot;static grammars&amp;quot; suggests a new kind of design, where the &amp;quot;dynamic grammars&amp;quot; would be generated from the specifications and from some strategies, possibly expressed as &amp;quot;met~-uules&amp;quot;. &amp;quot;Multisliced decorated trees&amp;quot; (16) have been introduced as a data structure for the explicit factorization of decorated trees. However, there remains to develop a full implementation of the associated parallel rewriting rule system, STAR-PALE, and to test its linguistic practicability. Last but not least, the development of true &amp;quot;translation expert systems&amp;quot; requires an intensive (psycholinguistic) study of the expertise used by human translators and revisors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML