File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-2155_metho.xml
Size: 17,313 bytes
Last Modified: 2025-10-06 14:12:14
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-2155"> <Title>Machine Translation for Monolinguals</Title> <Section position="4" start_page="0" end_page="760" type="metho"> <SectionTitle> 2. Aidtrans: the Sheffield Japanese-to-English system </SectionTitle> <Paragraph position="0"> The Aidtrans Japanese-to-English prototype (implemented in C, and running on a Sharp Unix-based microcomputer) is an implementation of a comprehensive, highly detailed and sophisticated algorithmic grammar of Japanese developed by Dr. Jiri Jelinek as a teaching tool for rapid intensive instruction in technical Japanese (Jelinek 1978). The core of this grammar is its Integrated Dictionary System (IDS). The philosophy of IDS is to incorporate as much as possible of the grammar and the analysis heuristics in the dictionary. This is done in an explici.tly language-specific, and, as applied to translation, language-parr specific form, allowing great accuracy and precision (at some inevitable cost in adaptability). The dictionary of the finished prototype contains entries for some 6,000 words.</Paragraph> <Paragraph position="1"> While committed to the maximum use of lexieal resources, Aidtrans also sees translation as a relation over whole texts rather than individual words or even sentences. The purpose of each act of translation is to retain the global sense, rather than the concatenation of word-meanings, of a text its it is reformulated in a different language. To achieve this, it is clearly not enough to produce one acceptable translation for each separate sentence of a text and adjoin them. Just as a syntactic parser will produce alternative analyses of an ambiguous sentence from which the one intended must be selected, so Aidtrans produces alternative translations of each part of the input text, from which the translation most appropriate to the context must be selected.</Paragraph> <Paragraph position="2"> Such selection from among possible translation equivalents is familiar from human translation or post-editing. Here, however, much of the selection, or rejection, is cmTied out by the system itself. A text-type-specific linear predictive model is the basis for determining priorities or preferences among the possibilities. Patterns can be recognized at the general level of syntactic configuration and at the more specific level of individual lexical items and collocations; at present the system recognizes well over 200 different types of juxtapositional linkage. In other words, the selectional function in Aidtrans is driven by a generalization of valency, augmented with priority weightings for the possible valency values.</Paragraph> </Section> <Section position="5" start_page="760" end_page="762" type="metho"> <SectionTitle> 3. Ntran: the UMIST English-tu-Japanese system </SectionTitle> <Paragraph position="0"> Ntran - its design inspired by Rod Johnson, and developed and first implemented largely by Peter Whitelock - is less target-specific than Aidtrans. The prototype is implemented in Prolog for the sake of rapid and perspicuous development; versions now exist in Cprolog, New Improved (Edinburgh) Prolog, and Quintus. During the course of development, versions have been run on a DEC MicroVax II, an ICL PERQ, and most recently a Sun 3/50 workstation.</Paragraph> <Paragraph position="1"> Through a system of nested menus, Ntran functions on three levels: as a system development system, a grammar development system, and a translation system proper. Each level offers specific facilities for the writing, testing and debugging of appropriate areas of program code. (For details, see Whitelock et al 1986).</Paragraph> <Paragraph position="2"> Although both prototypes give the maximum weight and information content to the lexicon, another point of difference between them is that Ntran is committed to the principle of translation as linguistics (cf. Johnson 1987), and designed and implemented as an explicit embodiment of contemporary lexicalist linguistic theory. The English analysis grammar is based on Lexical-Functional Grammar (Bresnan, ed. 1982) and Generalized Phrase Structure Grammar (Gazdar et al 1985), the Japanese generation grammar on Categorial Grammar (Steedman 1985, Whitelock 1987).</Paragraph> <Paragraph position="3"> In analysis, words are first looked up in an English morpho-syntactic dictionary which specifies grarmnatical category and morphologically determined features such as tense and number. The entries in this, as in all dictionaries, are compacted by &quot;feature co-occurrence restrictions&quot; which factor out any feature-values which are predictable on the basis of others. These derive largely from the fcrs of Generalized Phrase Structure Grammar (Gazdar et al 1985). In English, for example, any lexical item which has tense must be finite and a verb. In a lexical entry assigning any value to &quot;tense&quot;, the specification of finiteness and verb-hood would be redundant, and can be supplied by a generalized rule of the form ft.~t(te, lse=_,\[thl=linite, stemtyp--verb\]).</Paragraph> <Paragraph position="4"> Shktilarly, a~ any verb has no noun features, but sets (possibly emp!~) of Inepositional complements and adjuncts, and as any '4ng form i~ a progressive finite verb, we have rules tct (ca t=verb,\[nounfeat.s=\[\],peomp=setC),adj unct=setC)\] ). fer(nffonn-~ing,\[stemtyp=verb,aspect=progress,inf=no\]).</Paragraph> <Paragraph position="5"> Osiilg thi,~ l~mited information, the parser builds all possible &quot;functional structures&quot; (the &quot;f-structures&quot; of LFG), which .serve at an hlterawAiate representation abstracting away ficom surface constituent structm~, a particularly valuable level when ~ta~liating between a configarational language such as English aid a non--configurational one such as Japanese.</Paragraph> <Paragraph position="6"> ~ second stage of lookup in the English &quot;subeat&quot; dictionmy, which holds possible subeategorization patterns, elinilnates spurious f-slxuctures, and provides a semantic irtterpretatioo (&quot;s-structure&quot;) for those which remain. (Cf Wood ut al 1.987.) S-structure forms the basis for transfer, driven by bilingual dictionaries, the only component to hold contrastive infol-mation. The resulting Japanese s-structure is the basis for generation of a Japanese f-structure, using syntactic information held in the bilingual and Japanese dictionaries in the form of the eomplex categories and combination rules of a unification catcgorild grammar (see Whitelock 1988 for details). Surface ordering of the Japanese output is finally carried out by linear precedence rules. &quot;lTae role and form of user interaction will be discussed below.</Paragraph> <Paragraph position="7"> 4. Techniques for interactive translation As mentioned earlier, both Aidtrans and Ntran are designed for an English monolingual end-user. This approach reflected in the joint project's Alvey title, &quot;Read and write Japanese without knowing it&quot; - distinguishes them from cun'ently commercially available machine (aided) translation systems~ and has led to a nmnber of distinctive design decisions.</Paragraph> <Section position="1" start_page="760" end_page="760" type="sub_section"> <SectionTitle> 4ol Aidtcans </SectionTitle> <Paragraph position="0"> lu the case of Aidtrans, the intention was, while leaving the lhml selection of the exact translation to the enduser, to prod,ace output of greater accuracy and coherence than is generally Jbund in current post-editing systems. The strategy of multiple l;eneration produces a set of complete alternative translations, 3uther than one which nmst be amended piecemeal by a posteditur, while the text-type-based predictive model and preference-weighted linkages cut down greatly on the range actually offeled to the end-user, and group those which survive into semantically and stylistically coherent wholes. Thus, while a conventional posteditor needs access to the source text to cheek the accuracy of raw output and as a guide to its revision, hexv~ enough information is available in the output to form the basis of the end-user's final selection.</Paragraph> </Section> <Section position="2" start_page="760" end_page="762" type="sub_section"> <SectionTitle> 4.2 Ntran </SectionTitle> <Paragraph position="0"> The facilities for, or demands on, the end-user of Ntran me somewhat more complex: both the complexity of the task and the inner ariienlation of the system are greater, giving both the ne~d and the opportunity for a variety of interactions (~1' Jotmson & Whitelock 1987). 'lb ensure to an English mor, olinl~al technical writer file output of accurate and acceptable ~vpanese, the conventionalstrategy would be preediting, passing to the mactgut~ only text in a restricted sublalguag~ known to be with~ its translation capacity. Our system ca)uk~ perhaps be said to offer interactive pre-editing hatedeavCd with translation, rather than interactive translation paotscr, as ~o contcastive or bilingual information is presented to the end user hi the interaction. The restricted input sublanguage, however, is simply grammatical English, which if a~nbig~ons must be disambiguated. This should be seen not as a constraint c~n a technical writer but as a desideratum.</Paragraph> <Paragraph position="1"> The Ntran prototyl~ is designed to offer three torms of interactive query: onYlfiae dictionary creation, syntactic disambiguadon of English input, and Japanese lexieal selection in transfer. When a word is found in an input text for which no dietionaly entry yet exists, the user is offered the option of creating an entry for it immediately. This is done using a tree-structured question procedure, eliciting the category of the English word and its values for the features associated with that category, such as mass/count and animacy for nouns, valency and aspectual type for verbs, gradability for adjectives, and so on. The on-line dictionary building routine, although it incorporates a reasonable range of information about an English word, does not ask for Japanese translation equivalents. Instead, entries created in this way are held in a separate dictionary file, where they are aceessible to the analysis component, but also set aside for later completion by a bilingual linguist.</Paragraph> <Paragraph position="2"> Until this is done the English word is at present simply passed into the Japanese output in its original form, marked off by a special delimiting character. We intend to implement in a fnrther developed version of the system a facility for passing through such words in katakana transcription. Given a reasonable core dictionary, most new words will be specialized technical terms, for which this will in fact be the correct rendering.</Paragraph> <Paragraph position="3"> Syntactic ambiguities in the English input are also referred to the user for dismnbiguation. The parser fnrst builds a surface syntactic dependency structure, or functional structure, which is then mapped to a deep or semantic structure, and a record kept of the mappings ('obj', for example, is mapped to 'argO'). During this mapping stage, a record is also kept, for each well-formed s-structure produced, of the set of mappings entailed by the subeategofization requirements of the lexical items involved. Each mapping records the derived semantic relation which is assigned between a constituent and its parent.</Paragraph> <Paragraph position="4"> Examples of &quot;maptrace&quot; are given with the examples below. The disanlbiguation module then computes a set of differences among all the recorded mapping sets and builds a set of all those relations which are true for only a subset of the parses. These are then presented to the user, after conversion of some of the internal semantic relation names to external names which are intended to be more immediately understandable.</Paragraph> <Paragraph position="5"> The generator for the user-form representation of mappings is: deseribeas(map(X,Y,Z),\[X,' is ',C,' of ',Z\]) :logtocase(Y,C),!. null logtocase(argO,objec0.</Paragraph> <Paragraph position="6"> logtocase(argl,agen0.</Paragraph> <Paragraph position="7"> logtocase(ben,beneficiary).</Paragraph> <Paragraph position="8"> logtocase(loc,location).</Paragraph> <Paragraph position="9"> logtocase(rep,representation).</Paragraph> <Paragraph position="10"> logtocase(instr,instmment).</Paragraph> <Paragraph position="11"> logtocase(adj unet,modifier).</Paragraph> <Paragraph position="12"> logtocase(X,X).</Paragraph> <Paragraph position="13"> It should be noted that this mechanism suceesfully represents both purely structural ambiguities such as prepositional phrase attachment, and also subeategorization ambiguities, as in &quot;write on the deck of the ship&quot;, where &quot;deck&quot; could be either the location or object of &quot;write&quot;.</Paragraph> <Paragraph position="14"> The alternatives are presented as a set of statements distinctively characterizing thepossible semantic interpretations, as can be seen in the examples below. The user responds with the number of any statement which is true, or 'T' followed by the number of any statement which is false.</Paragraph> <Paragraph position="15"> Because of a technical implementation detail cons'tituents are at present referred to only by their heads: thus, in this example set of queries, &quot;and is object of active&quot; means &quot;(workstations and terminals) is object of active&quot;. Obviously this aspect of the presentation could be improved in a more fully developed system. One could also present the alternatives in quite different ways, by paraphrases of alternative readings, for example, or dependency trees or some other graphical interface, generated by the same underlying mechanism.</Paragraph> <Paragraph position="16"> 4.2.3 Japanese lexical selection Finally, ambiguities, or alternatives, may arise in the selection of a Japanese translation equivalent for an English wo,d or expression. Interactive systems standardly offer such alternatives directly to the user, who must have some competence in the target language to be able to make rite choice. Ntran's Japanese dictionary entdes include English glosses, and the user will be offered these to chose between, rather than the Japanese head-words. This facility is not yet fully implemented.</Paragraph> <Paragraph position="17"> 5. The system as translator and the monolingual ,~.sex ~ Clearly, ensuring reliable ti'anslafion tbr ~ monolingual user in either direction requires a system dt:sig~l carefully tuned to the task. In the ease of &quot;import translation&quot;, translating into the user's language, the information content of the output text must be sufficiently rich that, in cases of uncertainty, reference to the source text (the traditional recourse of the post-editor) is adequately replaced by reference to the set of coherent possibilities offered in that output. This is exactly the strategy implemented in Aidtrans.</Paragraph> <Paragraph position="18"> In the case of &quot;export translation&quot;, when the user is a speaker of the source language, the system can ,'equest additional information at a number of stages in the translation chain, to supplement the information inherent in the staface form of the input text, if that proves insufficient for syntactic analysis, serrmntie interpxetafion, and/or target language lexical selection. (Although the obvious, and ultimate, source of such supplememtary information is the human end-user, we envisage the long-teml possibility of referring queries first to intelligent, world-knowledge-based modules within the system, leaving the human user as a progressively less often needed safety net) Ntran's modularity of design isolates the stages of the process clearly from each other, while our commitment to the implementation of linguistic theory offers formats for the presentation of choices by the system and the input or' iuformation by the user which are transparent to both.</Paragraph> <Paragraph position="19"> *** CCL Grammar Development System *** Version 0deg65 level 31a *** type the number of any true statement or fnumber of any false statement 1 on islocation of position true for parses \[2-1\] 2 on is location of correspond true for parses \[I-i\] please choose: The cursor corresponds to, the puck position on the tableto maptrace(l,l, \[map(correspond, arg0,pres), map(cursor, arg0,correspond), map(position,argl,correspond) t map (on, loc, correspond) I A\] ) * maptrace (2, I, \[map (correspond, arg0, pres) , map (cursor, argO, correspond), map(position, argl,co{respond), map(on, loc,position) IA\]) o The cursor corresponds to the puck position on the tablet.</Paragraph> <Paragraph position="20"> ka-soru ga taburetto de no pakku iti ni soutou suru cursor NOM tablet ATTR ADN puck position DAT correspond pres parsing: 36sec parses: 4 deep: 1 transfer: 49sec xltns: 2 translation 1 ka-soru ga taburetto no ue no pakku iti ni soutou suru cursor NOM tablet ADN above place ADN puck position DAT correspond pres translation 2</Paragraph> </Section> </Section> class="xml-element"></Paper>