File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/86/c86-1131_abstr.xml

Size: 15,861 bytes

Last Modified: 2025-10-06 13:46:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="C86-1131">
  <Title>Semantic based generation of Japanese German translation system - Result and Evaluation-</Title>
  <Section position="1" start_page="0" end_page="561" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Project SEMSYN*** achieved a state where a prototype system generates German texts on the basis of the semantic representation produced from Japanese texts by ATLAS/It of Fujitsu Laboratory. This paper describes some problems that are specific to our semantic based approach and some results of the evaluation study that has been made by the Germanist group.</Paragraph>
    <Paragraph position="1"> I. Generation procedure in SEMSYN This section summarizes the SEMSYN genration procedure. Those readers who are more interested in the SEMSYN system are recommended to read our previous COLING84\[I\] paper or the paper submitted to this conference\[2\]. The generation process begins with the conversion of the semantic networks, each represents one sentence, into a so-called IKBS (Instantiated Knowledge Base Schema.) The IKBS is an instantiation of case or concept schemata denoted by semantic symbols as nodes in the semantic network. A case schema contains three main description slots; a) roles of cases associated with the semantic symbol, b) transformation rules of schemata, c) choice of German syntactic realization schemata.</Paragraph>
    <Paragraph position="2"> Being triggered by the semantic symbols of the given network, IKBS specifies the best basic syntactic structure associated with a German word by checking fillers of roles and converts them into functional roles within each German syntactic category. A German syntacto-morphological component called SUTRA-S\[3\] a extended version of SUTRA \[4\] generates German surface texts from the instantiated syntactic structure called IRS (Instantiaed, Realization Schemata.) Though English-like terms are used for semantic symbols, the choice of a German word associated with each semantic symbol and its syntactic structure very differ from the English corresponding one.</Paragraph>
    <Paragraph position="3"> It. Some problems of semanitc based translation approach There are some advantages as well as disadvantages of the semantic based approach, which we anticipated at the beginning of the project. Theoretically speaking, a reason why we adopted a semantic based approach againt the syntactic transfer approach is founded on the cultural difference and communication barriers between the two project groups that cooperate with each other to build up a translation system. Understanding the content of the origenal sentence from the given semantic representation the generation group could express it in a way that is common in its mother tangue, relatively free from the syntactic restriction and lexical corresponding terminology. It is a well known fact that one language of a culture can only be interpreted and not literally be translated into the other languages of different cultures, as it would be possible within the same cultural sphere. As the matter of fact we often took this advantage in our generation system.</Paragraph>
    <Paragraph position="4"> On the other hand, exactly this freedom turned out frequently to be a disadvantage on the generation side.</Paragraph>
    <Paragraph position="5"> Dealing with real data (titles of sientific papers in the field of information technology from the Japanese data base JOIS) we encountered new problems we didn't expect before and recognized the limit of our approach.</Paragraph>
    <Paragraph position="6"> In the following we describe some of these problems: (l~ion ~ese oriI~inal text We had also to come up with this well known problem such as lack of articles (definite or indefinite) and of distinction between numbers (singular or plural.) for nouns as well as verbs. We embedded some heuristic rules in KBS and dictionary to add these syntactic features, if they must not be missed in the German text. There still exists deeper semantics which rules the decisions, but cannot be represented in general, except for very limited cases. Heuristic rules are based on our ambiguity conservation principle, i.e. we keep the ambiguity of input text as much as possible to avoid any active selection of one alternative, that might lead to a wrong expression from the view point of the author of the titles. Following examples show typical errors of numbers and articles generated by the present SEMSYN heuristics.</Paragraph>
    <Paragraph position="7"> They also illustrate how difficult it is to find a trade off between the ambiguity conservation and an active decision infered from the content: E.g. l:~J~I~PSm$~-C0)i~0\[~7 4 ,y~7&amp;quot;In~Y~Zs0~  (The application of ~nall_.cg..mRuter SS for the execution of lar ege~~ggramms) Comment: The author of the paper will discuss how to use a small computer to execute a very large graphic package, so readers may naturally assume one small computer instead of many small computers, though it is possible to assume the latter. On the other hand, it is generally assumed that a computer processes many programms. For this reason the latter plural case is more natural than tile former case. However, it is a bad German to have neither a number feature nor an article as it is in the original text.</Paragraph>
    <Paragraph position="8">  E.g. 2: ~iI~I~50~Y p e I) ~---~/~ ~/O')~y)~ ~y\]-~&lt; I/--~Y- 4 ~/~&gt;~ SEMSYN generation: Die Entwicklung des Kerns brim Betriebssystem yore verteilten T~ 2 fuer real -time Anwendunge_n.</Paragraph>
    <Paragraph position="9"> Correct German: Die Entwicklung des Kerns eines verteilten Betriebssystems fuer Echtzeitanwendungen (The development of the kernel in the operating system of the distributed type for real-time applications) Conmmnt: It is assumed that the author developed the kernel of one distributed OS, instead of many distributed OS, for many applications.</Paragraph>
    <Paragraph position="10"> 2) ~IJit of~'unctions  One of the hard problems we expected in our semantic  representation was the ambiguity in the coordinating conjunctions in an attributive context such as: &lt;AP&gt; A, B and C &lt;PP&gt;.</Paragraph>
    <Paragraph position="11"> E.g.3 high speed bus, memory and switching in bit slice technology The scope of context could be made unique, if the semantic network could allow such a node which denotes a subnetwork. The following conjunctive subnetwork is classified into three basic cases:  In practice, however, we found that 90% of about 380 titles which contain conjunctions among 2000 titles we so far generated from the given semantic networks belong to the case (i); only about 8% are the case (ii), and the rest is the case (iii). This statistic results may be spesific for the titles, but this indicates that authors of titles are aware of the syntactic structural ambiguity and consequently try to avoid the above straight-forward sequence of conjunctions except for the case (i). Beside this statistic sample-based facts, the conjunctive ambiguity is further weakened by the fact that the generation system produces ambiguous titles according to our ambiguity conservation principle to let expert readers naturally infer which is meant by the author.</Paragraph>
    <Paragraph position="12"> At the moment we deal with the both cases (ii) and did by exploiting this possibility to convey the ambiguity so as if it were the ease (i).</Paragraph>
    <Paragraph position="13"> Timugh this conjunctive ambiguity in semantic networks seemed to be a serious factor at our first glance on them, it fortunately turned out to be a very minor problem as the evaluation study indicates.</Paragraph>
    <Paragraph position="14"> 3) ~c_problem Generally speaking, a semantic based generation approach has a strong advantage as well as disadvantage ill terms of sentence styles. The stylistic advantage is based on the large freedom of interpreting a given semantic representation. A serious disadvantage is the exactly the other side of this interpretation freedom. Following examples illustrate typical stylistic problems of our  werden, fuer die Kommunikation (Specification, simulation and development of protokol, for which PDIL is applied, for the communication) Comment: &amp;quot;fuer die PDIL verwendet werden, ~{uer die Kommunikation.&amp;quot; should be expressed as &amp;quot;.. sines Kommunikationsprotokolls unter Verwendung von PDIL.&amp;quot; ( &amp;quot;.. of a communication protocol by using PDIL&amp;quot;, in stead of &amp;quot; for which PDIL...) SEMSYN generation: Die Repraesentation yon Informationen, die ableitbar in einem Speicher gewesen werden.</Paragraph>
    <Paragraph position="15"> (The representation of informations, which can be derived in a memory) Comment: The clause &amp;quot; .., die ableitbar..&amp;quot; should be replaced by an adjective phrase 'iron der aus dem  (The application of data base systems for the listing of conditions for the surface hardening procedure with CO2 laser.</Paragraph>
    <Paragraph position="16"> Comlaent: Instead of repeating nominalized case frames for role purpose &amp;quot;Verfahren zur Aufstellung&amp;quot; and &amp;quot;Verfahren zur Verstaerkung&amp;quot; should the latter be expressed as &amp;quot;Oberflaechenverhaerterungsverfahren&amp;quot; Though bad styled expressions may transmit the correct meaning, they substantially reduce the understandability of the generated texts. The stereotypical bad styles can be easily improved ill some cases; however, the style conversion problems seem to have its inherent continuous depth from &amp;quot;easy to patch&amp;quot; to the infinite depth to be pursued in a long run.</Paragraph>
    <Paragraph position="17"> 4) Cultural difference problems Before we started the project we discussed many problems that arc specifically attributed to the well known cultural difference. In the following given are some of the real problems we encounted in dealing with title translations: null i) Focus shift We have frequently to come up with the difference of focussing, that forces us in a conflict situation whether we should prefer fidelity of the translation to the common style of German titles.</Paragraph>
    <Paragraph position="18">  Comment: The original Japanese text does not contain an explicit word that coreponds to the semantic symbol &amp;quot;USE.ACT&amp;quot;, that is infered by tile analyzer. Generally speaking, however it sounds better in German if a expression explicates the meaning in a more resolved form, while ambiguous expressions or even fuzzy expressions are prefered in Japanese. In this example the purpose arc expressed as &amp;quot;zur&amp;quot; implies the application of the semantics.</Paragraph>
    <Paragraph position="19"> ii) Reversed causality Tile most striking case that exemplifies the opposite relation between east and west is the reversed expression of causality, mostly would-be results are used instead of the cause in Japanese and vice versa in west. Following example demonstrates the fact: SEMSYN generation: Problems bei der Ausbildung ueber Lehrer, die spezielles Computerwissen besitzen, innerhalb sines Schulsystems.</Paragraph>
    <Paragraph position="20"> (Problems of training teachers, who own special computer knowledge, within the school systems)  Comment: Here the Japanese original text means that the special computer knowledge is a result of the training. If the teachers have already this special knowledge, they don't need the training. Therefore, it must be expressed as &amp;quot;so as to have ..&amp;quot; At the moment neither our analyzer nor generator can afford such a deep understanding of input texts. Our approach is still open to enrich the TRAIN scheme to represent causal relation of the TRAIN concept which for ces to reverse the causality of given meaning.</Paragraph>
    <Paragraph position="21"> HI. Evaluation About 20% of the translation results produced from the available semantic networks are evaluated. In order to avoid the misunderstanding it is worth to make it clear that this evaluation was not done by the so-called blind test, instead, all semantic networks are already used as our training samples. This is because at the time when the evaluation study started we had only 2000 semantic networks available. The evaluation results are summarizes as follows:  i: Exactly the same meaning as the original text 2: Almost same content 3: Still acceptable and informative 4: Only partially acceptable 5: Nothing to do with the content of the original text Grade Fidelity 1: Correct style, syntax and morphology 2: Correct syntax and morphology, but stylistic defect and vice versa 3: Still readable, but substantial mistakes in syntax, morphology and style 4: Almost unreadable as German text 5: Not German  Based on this evaluation results we sorted our error sources. Following results show the error classification from which the readers can figure out the development state of our system.</Paragraph>
    <Paragraph position="22">  The above classification indicates that dictionary problem cannot be solved in a short term. Especially in our approach, a semantic symbol generally corresponds to an upper concept, under which an appropriate German term is registered as a specialization. Therefore the terminology selection within a lexical entry is indirectly done through its context. Again, this very advantage of expression freedom causes a bad selection of a target word. We need time to polish our semantic German terminology data base so that system can select right German words in general. null The noun compound is a specific problem in German. By constructing a noun compound a stylistic problem may elegantly be solved (cf. e.g.9, I0), because otherwise using a modifier (possessive attributes, qualifiers and quantifiers, etc) results in an awful expressions that can not be compared with an alignment of English terms. We also found a conflict situation in connection wilh the selection of technical ~t~'ms. While we prefered common English technical terms in the field of informalion processing as CS experts for tlle reason of easy understanding, evaluators emphasize the authority of national standard technical terms (DIN), e.g. CRT (Datensichtgeraet), real-time (echtzeit), etc.</Paragraph>
    <Paragraph position="23"> The reason why the German ideom &amp;quot;unter Verwendung yon&amp;quot; was frequently used can be attributed to the semantic symbol &amp;quot;USE.ACT&amp;quot;, often infered (about I0%) by the analysis system. (Note: USE.ACT covers &amp;quot;verwenden (use)&amp;quot;, &amp;quot;anwenden (apply)&amp;quot;, &amp;quot;Gebrauch machen (make use of)&amp;quot;, bul also &lt;instrument&gt; arc for &amp;quot;mit (with)&amp;quot;, &amp;quot;}nit HilPSe yon (with the help of)&amp;quot;, etc). This means that the explication of USE.ACT of an implied meaning in the original Japanese text may either elucidate the situation in German (this is often the case) or make expression harder. By the same token a postpositional phrase or adjective phrase of an original text may awkwardly be expressed in a German relative clause. As the modifier and USE.ACT cases above mentioned, exemplify the situation, the over analysis and over-expression are specific to our semantic based approach and could be avoided in other transfer approaches.</Paragraph>
    <Paragraph position="24"> IV. Conclusion We discussed some problems of our semantic based approach. Many of them are also common to other aproaches. However, our approach seems to be open for continuous improvement in dealing with these problems.</Paragraph>
    <Paragraph position="25"> We express our sincere thanks to the ATLAS/It group of Fujitsu Laboratory, Kawasaki for making semantic representations available for our generation</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML