File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2125_metho.xml

Size: 10,782 bytes

Last Modified: 2025-10-06 14:12:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-2125">
  <Title>IMPL~\[CI(TNESS~ A~:; A G(.lt~)I\[NG PRINC~itLE iN tV~AC~/li~NE TRANSLATION</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
IMPL~\[CI(TNESS~ A~:; A G(.lt~)I\[NG PRINC~itLE
iN tV~AC~/li~NE TRANSLATION
Klaus SC~IIJBI!;RT
</SectionTitle>
    <Paragraph position="0"> lgSO/Research, Postbus 834{L NI,.3503 RH Uhecht, The Netherlands schubert@dill .uucp Multiling~,al cxtcnsibility requires an MT system t(~&amp;quot; have a tau/,uagc-iudcpendcnt pivot. It is mgtmd that au ideal, purely so. mastic pivot is impossil)le. A translafiou method is descfihcd iu which scmantic relations m~ kept implicit in synlax, while file scmanlic trails and distinetious am implicit in the words of a fllllftcdged language iisell as pivot.</Paragraph>
    <Paragraph position="1"> L l~iulfiiinguai e~tensibility There is an extcnlal fitctor with vcry substantial conscquenec,,; lot the internal design o1&amp;quot; machine translation systems: exteno aibility. When a machine mmslation system has to allow lbr adding m'bitrary soumc m~d target languages without each time adaptint; the atmady existing pa~ts of the system, tim Reed arises for at careftflly defiv.ed interface ~;tr,ctm'e to which modules R)r addithmal lauguagcs may bc linked. The design that besl lneets these requirements is the pivot or interlingual appmac, h, since it~ such a system them is only a single interface whi,:b gives access it) all tim languages already included in {itc system.</Paragraph>
    <Paragraph position="2"> In modeis of this type the only link hatween a source and a iarget lm~guagc is file in\[ermediale relcwesentation, it has a  double lhnetion: 1. The intcrmediNe representation should render the tel content of the iext being translated, with NI its details aud mtances.</Paragraph>
    <Paragraph position="3"> 2 The intermcdiae representation stmuld contain lhe  resnlts of Ihe grammatical analysis cmTied out on the som~:c text, wbem these chala('telistics are translation-relevant.</Paragraph>
    <Paragraph position="4"> (t is desirable that the intermediate mpresc, ntation express both the cnatem and the glammatical dlaracteristics of the text tmambit;aonsly, and since it is the interface to arbitrary languages, it should express them in a languageoindependent way.</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="599" type="metho">
    <SectionTitle>
2o Lan!!:m~,,e.indepet~de~Nt semantics?
</SectionTitle>
    <Paragraph position="0"> 't'o r(a~(.er tx)th the content aud the \[/HlCtiona{ features of a text is vsually taken to mean Nmlling them out in ~m appropri..</Paragraph>
    <Paragraph position="1"> ate way. Tim intemmdiate mpresentNion provides a formalism t;or this puq)osc. ' Spelling out means maldng explicit. My main concern here, is investigating to what extent the requiIed e~q~lf,~:~L'm~s can bc achieved in a lmlguage-independent rt:p,esv,tation. Am there language-independently valid categories and values \[or the characteris|ics of words trod wold groups needed in an intcimediate ~epresentation? (When speaking of grammatical analysis, I take grammar to denulc the study of the entire inlema system of language, so that i;uth sy~ttax aml semantics on all levels between mmpheme aud text am stfl)liclds of grammar, l'ragmatics, by contrast, describes the inlluenee of extralinguistic factors on lauguage and is ~mt pa,'t of grammar; el. Schubelt 1987b: 14f.) The form of the linguistic sign is l~mguage-spccifi,=, wt~mcas its content ix nm'mally thought to be language,hldellcndenl.</Paragraph>
    <Paragraph position="2"> The content side of Ihe linguistic sign is therclorc ollel) assumed to he a good tart\[urn comparationis lot tra~mlation grammar. In oilier words, the lrallsfer slap fn)m a sytnactic fmYn in lhc source language to a conesponding lbrm th the target language is perlonned on lilt: hasis of iht; common meaning the two forms are supposed to have.</Paragraph>
    <Paragraph position="3"> As a consequence, an intermediate reprcsenlation is usually devised as a structure in which this common meaning is made explicit. The intermediate representation is scan as a semantic equivalcnl of the source text. For obtaining such a slrucmre, a syntactic analysis of the source text is by no mean:, superfluous. An inlennediate representatim~ consists, like any system, of elements and their relations. In a semantic system elc menls and relations ine semantic. But ill order to detect the elements and their relations in a given text, a syniactic maalysis is needed. (&amp;quot;Syntax-fi'ee semantic parsers&amp;quot; a, pply syntactic knowledge lacitly, and as a nile they work especially well for languages where the sequential mdet of &amp;quot;purely st: mantle&amp;quot; elements carries symactic thlbrmation.) There are two major clusters of reasons why an ideal ~;cmantic intermediate representation of the language.independent lend sketched above is impossible, however desirable it may be i&gt;_ theory.</Paragraph>
    <Paragraph position="4"> Filet of 'all, tram are rm languagedndependeut sernanlic ale.</Paragraph>
    <Paragraph position="5"> manN. Whatever symbols am chosen --words, moiplmmes, numbers, letter codes... -- they are ,always inherently I.mgtmge-bnund. The elements of an artificial synthol system are either directly taken from an existthg language, or have aa explicit m implicit definition in a rcli.',rcnce hmguage. It is ira. possible to make a tufty language-independent system of sym.</Paragraph>
    <Paragraph position="6"> bols, if it is to possess the fill expressiveness of a hmnan language (ef. Schnbell 1986). Symbols cannot be giveu a meaning independenlly of a reference hmguage; I\[leir meaning can only become autonomous by being used th a language community during a long period. This is why a plamle(t language like Esperanto could not rank as a lhll.-fledged hn.</Paragraph>
    <Paragraph position="7"> nmn language fimn the very day the first textbook was pub.</Paragraph>
    <Paragraph position="8"> lished but had to develop slowly from ml artificial, refercnce language-dependent symbol system into m~ autotiomous hmguage by being used in a community (cf. Sehnbeit fotthc.).</Paragraph>
    <Paragraph position="9"> Perhaps this is an tmusuN argument in a eomtmtalkmal con.</Paragraph>
    <Paragraph position="10"> text, where people are u~d to defining symbol systems which they call &amp;quot;languages&amp;quot;. It shoukl be borne in mind, however, that such defined symlx)l systems am subsets of an existing human language (or o1' several). Machine translation, by con trast, is concerned with translatin G texts between thunau languages, which hem a sem,'mtic point of view -- even if die lmlguage may be simplified or the text pre-edited-- are inhermNy more complicated than artificial symbol systems.</Paragraph>
    <Paragraph position="11"> Not only are deft)ted semantic units in such systems reference hmguage-dependent, but the mad to the basic semantic units needed is via semantic deeompositim~ - with all its we11- null known problems. Scholars have for centuries been trying to find universally valid semantic atoms (or primitives), but none of the many systems suggested has met with acknowledgement or proved applicable on any wider scale. Individual languages cut up and label reality in different ways; no underlying &amp;quot;smallest semantic units&amp;quot; have been found as yet and possibly they will never be found. In my opinion the conclusion is that meaning is not portioned, so that no smallest portions can be found.</Paragraph>
    <Paragraph position="12"> Semantic atoms would be needed for totally spelling out the content of a text in a language-independent way, that is, in such a way that it would be suited for translation into any arbitrary target language. In many machine translation systems, ambitions are not that high. Most often, intermediate representations use words or other language-bound symbols, decorated with semantic features which are held to be cross-linguistically valid. Yet, what is true for semantic atoms applies to semantic features as well, albeit in a less obvious way: They contain portions of meaning which do not function in all languages in the same way. That semantic atoms and features are not as cross-linguistic as they seem to be, is also suggested by the experience that they are very hard to define and delimit in a way that fulfils exactly the required function, or denotes precisely the intended distinction for a large number of languages simultaneously. It is because of this that intermediate representations often have to be adapted, attuned or even redesigned when a new source or target language is added to the system. Such representations fail to provide for multilingual extensibility.</Paragraph>
  </Section>
  <Section position="3" start_page="599" end_page="599" type="metho">
    <SectionTitle>
3. Case frames
</SectionTitle>
    <Paragraph position="0"> The second cluster of reasons for the impossibility of an ideal, purely semantic, intermediate representation concerns semantic relations. One of the best-known approaches to making semantic relations explicit is Fillmore's ease grammar (1968). Deep cases are often believed to be cross-linguistically valid. Although there are many substantial difficulties in delimiting and labelling deep cases (cf. Fillmore 1987), many machine translation systems perform transfer with case frames. This works quite well to a certain degree, but slowly the insight is gaining ground that deep cases nevertheless are language-specific. If case frames really were an autonomous tertium comparationis, translating on the basis of case frames would mean just filling in target language forms in a language-independent case frame obtained from the source language analysis. But in reality case frame-based translation often entails a transfer from a source language-specific case frame to a target language one. Evidence for this need comes first from general linguistics (e.g. Pleines 1978: 372; Engel 1980: 11), but recently alms up in computational linguistics as well (Tsujii 1986:. 656; cf. Schubert 1987a). This is in concord with Harold Somers' (1987: viii) observation about the popularity of case grammar, already declining in theoretical linguistics, but still in vogue in computational applications.</Paragraph>
    <Paragraph position="1"> Returning to the argument about a purely semantic system, it can be concluded that neither the elements nor the relations, which together should constitute the theoretically desirable language-independent intermediate representation, actually exist. This insight, among others, is the origin of the idea of implicitness in machine translation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML