File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1010_metho.xml

Size: 17,439 bytes

Last Modified: 2025-10-06 14:12:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1010">
  <Title>SOME PROBLEMS OF MACHINE TRANSLATION BETWEEN' CLOSELY RELATED LAN~GUAGES</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
SOME PROBLEMS OF MACHINE TRANSLATION
BETWEEN' CLOSELY RELATED LAN~GUAGES
</SectionTitle>
    <Paragraph position="0"> features resulting from the closed relatedness of the two languages, above all the possibility of a minimization of the transfer.</Paragraph>
    <Paragraph position="1"> Related linguistic problems are analyzed within the MT project, as well as in the perspective of contrastive linguistics.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="47" type="metho">
    <SectionTitle>
1. The system of Czech-to-Russian MT
</SectionTitle>
    <Paragraph position="0"> system called RUSLAN is conceived ruimilarly as all linguistically based MT systems) as a modular system consisting (in brief) of a source language parser, a tranfer and a synthesis of the target language. The task is to translate texts from the domain of computers, in particular manuals of operating systems.</Paragraph>
    <Paragraph position="1"> Since in RUSLAN the source language is closely genetically related to the target one, some of the modules of the system could have been considerably simplified, not leaving out of consi4eration the theoretical linguistic framework on which t\]~e system is based (dependency and stratificatlonal approach). The simplifications concern, first of all, the transfer phase, so that the system cannot be understood as including a complete transfer.</Paragraph>
    <Paragraph position="2"> 2. The effort towards a maximally effective procedure has also resulted in simplifications in the 2arser. This was made possible i.a. by the similarity of cases of syntactic ambiguity in the source and the target language. For example, with sequences of the type Verb Noun I Noun 2 ... Nouni, where each Nounj stands for a nominal or a prepositional group serving as a free modifier, the surface order can generally be preserved, which fact makes unnecessary a detailed identification whether any of the Noun~'s modifies the Verb or one of the precedin~ Nouns. This can be illustrated by the output Russian sentence &amp;quot;Vo vremja svoej raboty programma mo~et potrebovat' tak~e pomo~6&amp;quot; sistemy pri obrabotke failer dannych.&amp;quot; (Lit. &amp;quot;In course of-lts work program can need also help of-system in processing of-files of-data.&amp;quot;), where the group &amp;quot;pri obrabotke ...&amp;quot; can be analyzed (in both languages) as modifying the verb &amp;quot;potrebovat &amp;quot;' or the nouns &amp;quot;pomo~5&amp;quot;&amp;quot; or &amp;quot;sistemy&amp;quot;. If the order of the nominal groups is preserved, the translation also preserves the structural ambiguity of the original. Also nominallzations can be translated independently of their underlying structure (e.g. , &amp;quot;Indeksnoposledovaternyje faJly neobchodimo do obrabotki preobrazovat'.&amp;quot; - llt. &amp;quot;Index-sequential files have-to-be before ~rocessi__~ transformed.&amp;quot;, or &amp;quot;Programmy, napisannye na Jazyke Assembler v ramkach pred\[du~6e~ versii, neobchodimo snova translirovat'.&amp;quot; - lit. &amp;quot;Programs written in language Assembler in framework-of ~ version have-to-be again compiled.&amp;quot;).</Paragraph>
    <Paragraph position="3"> Such an approach made it possible, at first, to minimize the transfer phase in the design of the project, and then, in the process of realization, the articu\].ation of transfer operations into the pax'ser and the synthesis, which may lead to an impressio~ that RUSLAN works complete\].y without transfer, i.e., as a direst binary NT systemdeg In principle, it can be said that the minim:\[zation of the transfer reflects the empirical fast that the two languages have a lot el ~ common features.</Paragraph>
    <Paragraph position="4">  3. A great role is played in RUSLAN by the lexicon. The lexical entry contains maxi -~  mum of information, which is then projected to the syntactic rules; only the most general behaviour of words is rendered purely by means of syntax.</Paragraph>
    <Paragraph position="5"> The rules of choice of lexical equivalents include different types of informationdeg Along with the data on parts of speech and morphemics, semantic features are listed, and (esp. with verbs) also the valency (subcategorization) frame; the valency slots are ac.-companled by information on their Czech morphemic form as well as that of the corresponding Russian items (as an example of their discrepancy might serve the pair &amp;quot;u~ivat n~co(acc.)&amp;quot; vs. &amp;quot;po~zovat~Ja ~em(instrdeg)&amp;quot; -&amp;quot;to use stg.&amp;quot; ). Where pasivization is possi~ ble, it is indicated which of the slots (mostly, but not always expressed by aecusa-~ tire) is selected as the passive surface sub-Ject, expressed then by nominative. With each of the slots, the semantic features required or excluded for the filler of that slot are indicated. These features help to identify the fillers, especially in cases of ambigui ~ ty, e.g. in Czech &amp;quot;V~stupnl za~izenl nastavi ~,dkov&amp;nl na po~adovanou hodnotu.&amp;quot; (lito &amp;quot;Output device sets line-spaclng at required value &amp;quot;) the verb &amp;quot;nastavit&amp;quot; (&amp;quot;set&amp;quot;) has the following valency frame: Actor (nom/nom~ +Human ,+Device) , Objective (ace/ace ,~-Concr,+Result-of-process,-}luman), where &amp;quot;+ de ~ notes semantic features such that at least one of them has to be prescott with the filler of the respective slot, &amp;quot; &amp;quot; denotes semantic features excluded with the filler, and boldprint denotes Czech/Russian morphological forms. In this way, the ambiguity of morphemic case with &amp;quot;~&amp;dkov~ni&amp;quot; and &amp;quot;za~Izeni&amp;quot; (in both cases between non and ace) can be solved on the basis of semantic features of the &amp;quot;two nouns.</Paragraph>
    <Section position="1" start_page="0" end_page="47" type="sub_section">
      <SectionTitle>
3.1 The choice of the Russian equivalents
</SectionTitle>
      <Paragraph position="0"> for Czech lexical units should reflect also  structura\] differences between the two languageso These differences concern also syntac'iifc pat';;erns; at least 'the following cases should be distinguished: as Adj Ad j NoUil -C/~ AdJ Noun cxo: d~,,';;ov,9 f'fdic~ p~ikaz  -..&gt; upravljaju\[\[ij operator \].it, : data soutz'ol comand -,{.&gt; control oper ai;or be Noun -&gt; Adj Noun ex.: poiiita~ -~ vy~islitel~naJa ma~J.na 11%o,~ oomputez .... ~, computing machine c. Ve:rb -,~ Verb Noun.</Paragraph>
      <Paragraph position="1"> eXo: zkompilovat -~, osu~estv~%' kompJ\] jaciju lifo :to (!omp:i, le -~&gt; to carry out compi\].ation ddeg Noun --4~ Neuu Noun eXo: poi}gtek .-&gt;. toSka peresc~en_J;ja li%o: beginning &amp;quot;&gt; point of.-intersection e o Ad j Ad j Noun .-~.- Noun Noun Ad j Noun eXo: vyglil programsvaoi jazyk --.~, Jazyk programmtrovaniJa Vys,~ego urovnja \]:it o : highe:e prog,,amm:i, ng language -@ language of--programming of-higher level  (\]\].early, ~'~ome types ~C/re easier &amp;quot;to implement &amp;quot;than the el:hers, which depends 011 the eomple-xity of i;he respective Czech and Russian con~. struetions. For simplification of some cases of the type do ~ where %he Hussian equivalent includes a modifying noun in a fixed morphemic form, this is treated as an uninflected word, the syntactic relation of which is established already in the dictionary.</Paragraph>
      <Paragraph position="2"> 3,2 Due %c the closeness of the languages, useful, tng:redien~ can be seen in the idea of a trans(;ueing dictionary propose@ and. elaborated i~ the English-to-Czech NT system (eldeg Kirschner,82)o The transducing dictions .... ry, based on algorithmic handling of the regular productive international affixes (with exceptions listed in the main dictionary) and of the orthographic and similar differences, can be illustrated by the following : as with the suffixes -gig (mental, ,&amp;quot;assembly&amp;quot;) --~,.t (agreg/;t ,&amp;quot;agl&lt;regate&amp;quot;) , pen-~_ (koeficient , &amp;quot;coefficiest&amp;quot;) , -ura (kubatura ,&amp;quot;cubic vo-fume&amp;quot;) , an,:l the lexloal components of Greek er Latin opigin, such as -~_%:af, -~ko~o_ ~ (kar-diograf,&amp;quot;cardiegraph&amp;quot; ,elektroskop,&amp;quot;electroscope&amp;quot;) , the Russian equivalents differ at most in details b~ with other suffixes of international use, the Russia\[, equivalents correspond in a systematic way to the Czech ones~ as with -_~.~st-2l a/'-iK~. , C!9./.:::J=Ja. , .::J=9~.us/::!z~ m, z~Xn:i/.-~rn;\[ it '=!PSkPS/-4 ~. e s ki ~/ Co to a certain degree also word~ of Slavonic origin can be handled by a procedure based on correspondences with regular segment pairs such aS h/PS\[~ \]'3/~1, TraT/ToroT (where T s'l; and s fo:r an occlusive : krAtkp/korotki j &amp;quot;short&amp;quot;); such pairs as &amp;quot;brad&amp;quot; (&amp;quot;castle&amp;quot;) vs. &amp;quot;gored&amp;quot; (&amp;quot;town&amp;quot;) ~ where the lexical semantics differs, have to be \].is'bed in the lexicon.</Paragraph>
      <Paragraph position="3"> do whenever a word has net been identified in %he main dictionary and cannot he treated by %he procedures of the types as ,be ,Co , at \].east %ra~lsltteratJon and some of the elemenbary correspondences &amp;re carried ou\]; ~ so tha'b if cogs &amp;quot;pPepln~n~&amp;quot; (&amp;quot;overloading&amp;quot;) or &amp;quot;disketa&amp;quot; (&amp;quot;floppy disc&amp;quot;) were not found in the dictionary, they would be transduced as &amp;quot;pe-repolnenie&amp;quot; (correctly) an8 &amp;quot; disketa&amp;quot; (in-stead of &amp;quot;glbkij disk&amp;quot;), respectivelydeg This procedure , and a set of similar f~%il-so ft rules for syntax , should ensure that the output be basically undel's%anda.ble.</Paragraph>
      <Paragraph position="4"> /4. The procedures of synt ac tie ana\] ysis and synthesis are based on lexical ini'ormat ~ ion, including the valency frameso Certain difficulties arise when filling the slots of ohliga~ery adverbials (see Panevov\[t,80) with which the forms of a given adverbial type are variable ~ e.g. &amp;quot;vrA%it se kam&amp;quot; (&amp;quot;%o return somewhere&amp;quot;): &amp;quot;napravo&amp;quot; (&amp;quot;%o the rip;hi&amp;quot; , adverb) ~ &amp;quot;k problgmu&amp;quot; (&amp;quot;to the problem&amp;quot; ~ prepo-sition &amp;quot;k&amp;quot; + ds/tive) , &amp;quot;do bytu&amp;quot; (&amp;quot;into -the flat&amp;quot; , preposition &amp;quot;do 'I -l- accusa%J, ve) etc. Snch cases are handled by the parser tog-ether with free adverbials, only it must be ensured that the obligatory modifier is identified (in a case of ellipsis, it is necessary %o take into account the preceding&amp;quot; sentence although often the Czech deletion goes in parallel with that in the corresponding Rus-sian sentence).</Paragraph>
      <Paragraph position="5"> 4.fl One of the relevant differences be .</Paragraph>
      <Paragraph position="6"> tween Czech and Russian syntax concerns sentences with the Czech Ist person plural co: .... responding to the Russian reflexive form~:~ e .g. Czech &amp;quot;Algoritmus re zm is t~ovf)n i b I o\]cC/~ popisujeme v ~stJ 6&amp;quot; vs. }{ussian &amp;quot;AlPS;orJ.tm razme~enlja blokov opJ. syvaetsjs v razdele 6&amp;quot; (&amp;quot;The algorithm of dislocation of blocks Js described in Sects 6&amp;quot;). Often a modal e~.-pression is present: &amp;quot;NAzvy progz'am~ m~erne mayn't v knihovn~ I' vs. Russian &amp;quot;}!azvanija pro.-gramm me\[no naj%i v biblioteke&amp;quot; (&amp;quot;The titles of the programs can be found in the library&amp;quot;) o The linguistic rules underlying the practical solution of these problems can have %he following form:  sentence corresponds to a simJ.\].ar ambiguity in Russian. In other cases the ambip;uity in the two languages is not in such accord;.tncedeg This is illustrated by the fell. owing:  The preposition &amp;quot;o&amp;quot; with locative in Czech is kept also in Russian or, with nouns having the feature Time, translated as &amp;quot;vo vremja&amp;quot; with genitive.</Paragraph>
      <Paragraph position="7"> Differences in prepositional constructions are found also with the following pairs: c. Czech: Price n_~a programu pokrabujl i v tomto roce.</Paragraph>
      <Paragraph position="8"> Russian: Raboty nad programmoj prodol~ajutsja i v f, tom godu.</Paragraph>
      <Paragraph position="9"> (The works on the program continue also this year. ) d. Czech: Prhee na fakult~ pokra~uj~ i v %omto rote.</Paragraph>
      <Paragraph position="10"> Russian: Raboty na faku~tete prodol~ajutsja i v 6tom godu.</Paragraph>
      <Paragraph position="11"> (The works at the faculty continue also this year.) These examples cannot be fully accounted for by means of lexieal information, neither can they be included into the general scheme of syntactic rules. It is necessary to have a list of such differences.</Paragraph>
    </Section>
    <Section position="2" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
4.3 In translating Czech subordinate
</SectionTitle>
      <Paragraph position="0"> clauses introduced by such conjunctions as &amp;quot;zda&amp;quot; ,&amp;quot;-li&amp;quot; (&amp;quot;whether&amp;quot;) , &amp;quot;jestli~e&amp;quot; (&amp;quot;if&amp;quot;) , &amp;quot;kdy~&amp;quot; (&amp;quot;when&amp;quot;), &amp;quot;dokud&amp;quot; (&amp;quot;till&amp;quot;), &amp;quot;dokud he&amp;quot; (&amp;quot;until&amp;quot;) , &amp;quot;pokud&amp;quot; (&amp;quot;as long as&amp;quot;) , some of which are ambiguous, the text can be treated as relatively homogenous. The functioning of a clause introduced by &amp;quot;zda&amp;quot; or &amp;quot;-li&amp;quot; as a subject can be identified on the basis of the valency of the verb in superordinated clause, where it is marked whether the verb may take a subordinated clause as its Actor or Objective. In the other cases, suitable or at least acceptable translations of the conjunctions are as follows: Czech &amp;quot;zda&amp;quot;,&amp;quot;-li&amp;quot;,&amp;quot;pokud&amp;quot; ,&amp;quot;jestli~e&amp;quot; as Russian &amp;quot;esli&amp;quot;; Czech &amp;quot;dokud&amp;quot; ,&amp;quot;dokud he&amp;quot; as Russian &amp;quot;poka&amp;quot;,&amp;quot;poka ne&amp;quot; , Czech &amp;quot;kdy~&amp;quot; as Russian &amp;quot;kogda&amp;quot;.</Paragraph>
      <Paragraph position="1"> It follows that while it is necessary to work &amp;quot;to a certain degreewith the under lying structure, in'the majority of cases the equivalent can be chosen just in accordance with the conjunctions themselves.</Paragraph>
    </Section>
    <Section position="3" start_page="47" end_page="47" type="sub_section">
      <SectionTitle>
4.4 The Czech verb &amp;quot;btt&amp;quot; (&amp;quot;to be&amp;quot;) has
</SectionTitle>
      <Paragraph position="0"> several Russian equivalents: the copula &amp;quot;byt TM , verbs &amp;quot;est TM , &amp;quot;javljat~Ja&amp;quot;, &amp;quot;naehodit u sja&amp;quot;, &amp;quot;imet~ja&amp;quot;. The selection of the equivalent depends on the syntactic context: if the nominal predicate in Czech is in instrumental ease, then a form of the verb &amp;quot;javljat~ja&amp;quot; is preferred; if a local adverbial is present, then the translation &amp;quot;nachodit~ja&amp;quot; is at place, otherwise the appropriate form of the copula is chosen; Of course, another point concerns the translation of &amp;quot;btt&amp;quot; within idioms (&amp;quot;byt'v porjadke&amp;quot;, but &amp;quot;imet~ja v rasporja~enii&amp;quot;). null 4.5 The surface behaviour of negation is not the same in Czech and Jn Russian: in Czech, even partial negation is often expressed as a prefix of the verb, which gives rise to an ambiguity absent in Russian, where %hls distinction is always transparent. Some of the examples from our texts are:  substantially the same in the two languages; the differences concern only such specific cases as, e.g., the positions of parts of the complex verb forms or those of certain pro~nouns and particles which have the character of elitics in Czech, but usually follow the verb in Russian:  (In the operating systeme~ we shall try ...) The differences described in this section do not concern the structural order, and there is no danger that ambiguity might arise. The dislocation of function words and particles can be described by general rules.</Paragraph>
      <Paragraph position="1"> 4.7 In 4.1 through 4.6 we wanted to show what the problems of parsing are if the cor~ respondences in the underlying structure, in surface syntax and in the surface order of morphemes are to be made use of, while the differences are solved; we also wanted to il~ lustrate the narrowed, but nonetheless neces-sary role of transfer.</Paragraph>
      <Paragraph position="2"> 5. We wanted to point out that, on the one hand, the closeness of the two languages makes it relatively easy to find a strategy for an MT system, since the most complex problems of ambiguities might be partially avoided, although, on the other hand, comparative empirical research in the domains of lexicon and of syntax is necessary also for such a pair of languages. Results of such an approach may be useful in MT, and also in the context of a contrastive comparison of cognate languages.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML