File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/j79-1010_metho.xml
Size: 121,794 bytes
Last Modified: 2025-10-06 14:11:09
<?xml version="1.0" standalone="yes"?> <Paper uid="J79-1010"> <Title>herican JournaI of Computational Linguistics Microfiche 20 AN APPROACH TO VERBALIZATION AND TRAWSLATION BY MACHINE</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> AN APPROACH TO VERBALIZATION AND TRAWSLATION BY MACHINE </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Association for Computational Linguistics </SectionTitle> <Paragraph position="0"> Phis retport do3cribt:n [I modd for mnchino tr~nnlnt iori dei~elopcd at Berkeley durin~ l'972-7/tm 'Jhe ,nodel i~ brli1-t r~raurbd a cct of procedures czlkd vorbnlizatim, intendctl to sir::ul:iLlt? the procn::r,us emdoyed by a sb)ecaker or writer in turniny storm1 k~~owlecl c nto words, verbnLiz~tion in zcsn tr> conn int of :;~d-~cr)t~cw t11~1j~nt ir~r~ and lexic?lizn.t8on prr>cannr?r: i~hich i rlvrlPve c,lb .at ivr; r:hoicr:n on the part CI :' the ~1~5,ll izr;*r, tot7cthcr with :llr;orithalc s-rr~t act ic ~j~~ie,cr,r,cr; r?etemine(3 b;~ 1 ti I , 3 in 1 :; vic ~:.\rl :1:3 1) the r \constnuct of tblo venb;31 ixr1t i \~l 1 )rr)c:f:r,zc:; l~,!li-~r:~l v~~c~~L in'to the ori(<innl R~O :rCr 1 I, t(:x t af1d (2 tA.;n li17,l ic~t ion ~f' ~arnllel erl'rallzat i ~n .,lroct?E;sos in t lit t:iriret 1 -he tar ;ct lm-IJR +e ~~erS,Jizati~n .\)o%r; Tor cm?*3tive c:/xncea t I tht :our:c - + langu,l:l;s -verb:*liz -it ion wid tries to ;1:;1j117 c~rreg:~ondlng; C~~~CCS, db the silme time that it an lies sjrntactlc DF~>G~:BS~S d~ctrqtcc: 5-7 the gram=* 01 ti~c t:)rIyet Lanrr1~~3fre. Jerbglliz~tion :i:ld translatijn processes are illustrated, wlth exanpll;~ t!jkcln fro-.,: c;ny;llsh i-hi Japanese, .ti few of t:lesc 9rocctr;:c:s h:jvc bvcrl i,.,, 1sr;ierit;sd LYL an int er:jc t ~vc p-,s.op;r m i t P:ic:i1 1 t 1t;s of I L,a~rrer~c~~ Lk~~1:r21 ::,IT Laboratory, but t int~~~! t of LC. re i:; LC, (If?r:~orlsL r>:itl: L :I&: kinds of ps ,zer,ses th;jt need to 3c incor:,@r;ite::d I.rl ; : E;;,rT;tCrn.</Paragraph> </Section> </Section> <Section position="2" start_page="0" end_page="11" type="metho"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> IV. ~exicalization of a JJ V. Lexicalizati )n of a PI VI. Jhe Lexicon VII. Discourse Information and Readjustments VIII, Translation IX, P~iscellaneous rroblems in T~mslation This report donln with work performed by the Contrastive Gemant ics Project in the Department of Lin(wi::tics of the University of California at Berkeley. The project was supported by Air Force Contract No. F30602-72-C-0/cc)6, Associated with the project durin~ its entire life, in addition to the author, were Patrrlcia M. Clancy, Leonard PI. r'altz, Christopher Murcano, and Hasmig Seroplan. Also active during more than half of this period were Masayoshi Shib~tnni aqd Linda oobek. Associated during shorter periods of time were Teresa M. Chen, Charles J. Fillmore, Robert E. Gaskins, and Marie-Claude Jorlatid. Masayoshl IIirose served as a consult ant on Japanese during the final two months.</Paragraph> <Paragraph position="1"> Thls s~me report, in slightly different form, was published by Rome Air Uevnlopment Center, Griffiss Air Porce Base, ;Jew York, as 9ADC-Td-74-271 (October 1974).</Paragraph> <Paragraph position="2"> Central to the vlew of t~.~~rls~utlon that w111 be preqented here 1s the notlon of ~rbahzatlon, Verb illzntlnn 1s the ~ppl~cntlon of processes by whlch som?! hollstxc conceptual chunk, recalled from memory, 1s converted into sentences and words-lnco a phonetically or gra3hlcallg comunlcable llngulstlc representatlon. buch a notlon assumes that the underlying content of what 1s belng communicated 1s hot, or need a3t be, In verb31 form to begln wltH. At the very least it nay be a complex sjrszern of dlscrete elements and rel,~tlons, represen able perhips as a network of nodes aria arcs. It may also lnvolve m im7ortant nondlscrete or analog component, representable only m some other terms. o or excellent swarles of both sldes of thl- particular lssue see Pylyshyn 1913 and Palvlo 1977.) ahatever may turn out to be the case here, lt seems clear that some sorts of processes must be apphed m order to transform the orlglnal fom of storage Into a verbs1 output: that tbe stored materl 1 must be verbqllzed.</Paragraph> <Paragraph position="3"> Xn any partlcuIar instance of trs~slatlon there are tdo Instances of verballzatlon. One Ts the orleln 11 verbahzatlon nprformed by the crr~tor of the source language text. The vther 1s the verballzatxon r~roduced in the tzr~et 1ani:uq:e by th? tl mslatqr. 3esides belng In dlfferent l.ngune;es, these two verb~hzatlons are fundamentally different in one other respect. 2he source language wrballzatlon is, we mlght say, autononous. It. 1s freely produced by the spaken or wrlter in any way he decldes is alpro-Frl :te to the content md the occaslos, provlded he adhere. to the rules of hls culture aqd the langu Re he is uslnR. e tarmet language vcrbnlizntlon, on thc other hnnd, 1s parnoltlc on the source Imguage one. Not only must the translator adhere to the rules of hls own language, he must also produce a verballzatlon that commun~cates, so f lr as posslble, the srme underlyrng content or knowledge thnt was communicated by the source language verballzatlon. ?he verb?l~zatlon Ln the target language is thus subject to thls special klnd of constra~nt, Its producer is not free to &quot;say what he wants,&quot; but must insofar as posslble say the same thlng as the producer of the source language text. bde suggested In an earlier report that there are two chrnensl~ns of high quallty translstlqn, whnch we termed naturalnes-s and fldelltg, Naturalness 1s achleved when the tuget language verbal~zat~on adheres to all the constralnts of that language; the output w~ll then sound &quot;natural&quot;. Fldel~ty 1s achleved to the extent that the tzrget language verbal~zat~on communicates the same content as the source langusge one.</Paragraph> <Paragraph position="4"> Vesbdlzat~on In general, as we see ~t, conslsts of a mixture of two klnds of orocesses: those wh~ch necessitate creatlve declslons on the nart of the verbCdlzer and those which do not, semg governed by the constraints lrnlmsed by the lmfruage. e rnlght sneak of creqtlve nrocessec and al~orlthmic processes. Sreatlve processes are ultlnately ~ovsrned by the content whlch underlies the verballzatlan; the verb llzer has to declde how best to verbalize that content. Normally a range of cholces wlll be onen to him, and he must declde what will most effectively convey what he has In rnlnd. After he has made ssch cholces, there are often automatic consequences whlch follow from them because of the pnrticula rules of the 1:mp;ungc (hut which RTB themselves likely to leiid to the necessity of further crentivc choices). 'de can Say, then, with respect to the two vcrbalizat~ons involved in a t$anslation, that the producer of the source language, verl~alizat~on, has applied both creative and a1p;orithmic processes, wherehs in the target lwguoge verbalization only algorithmic processes are autonomously applied, the necessary creative choices belng determined by the choices that were made in the source lan(.;ufq;e verbalization. Thus the naturalness of the f lnal translation depends largely on adherence to the algorithmic processes of the target language, while its fidelity depends on the extent to whlch the translation has been able to incorporate qreatlve cl~oices that correspond to those originally applied in the source language. In a11 probability there are cases where exact correspondence in these choices Is not possible, and where a ceqtain mount of autonomous creativity has to be introduced lnto the target verb lizat~on s well. These are the cases where automatic translation becomes nost problematic. One useful goal of machine translation research can be to determine precisely the nature and extent of such cases.</Paragraph> <Paragraph position="5"> We are led, then, to the general picture of translation which is shown in Figure 1.</Paragraph> <Paragraph position="6"> The two vertlcill columns represent the two verbalizations whlch are involved: .In the left the source languasge verbalizatlon and on the right the target verbalization. zhe lnpEzt to a to a translation procedure, of course, is an already produced verbal output or text in the source language. The first major component - of the translation procedure will have to be the reconstruction from that text of the verbalizatlon nrocesses by which it w:l,c, prorlllced, r) k i lid of &quot;ci~~c .bml i z~.tion&quot;, he 1 rof cs to this nn the pnrsinPL com)~n;:nb, nlthou(;h it in cJnorlg difbfdr.cnt froa c';~nvontionnl pnrnine. Lt aim to rocanstruct, not n sin(7lc dqer, structure undorlyiny: the s1ir'fnf:e toxt, halt rnth r n is of processes by which thnt text was zrc~jterl from the knf)tdl pdyrc--not anlv r~onvcrb I bllt r, ssibly ever: nondiscrete--w&quot;l~h the c:3kr:r or writer had in ~:lnd. The but-~~ut f~f the pnrni.n(; corr~nont is idonllg a co~ lete rec~r~struction of bot-h the creatrvo nnrl ti-IF nlr;r)r~ thnic qroccsses which the source lanr;~l,y~e verbalizer ap.111 ed.</Paragraph> <Paragraph position="7"> The other nnjr~r c~m:,qnent of the trcanslatir)n proced!lre i:i the translation componept. It is equivalent to a vnrbql imtian- 111 trlc tarr;c?t language. 'The processes wklich rn:~Be u~ ti verbal-iz:lt~nq apem, to the extent that they are alcorithmlc, those which cxnrrtss tcargct 1nny;uaye constra~nts and, to the extent that they :.T creative, those whlch corresaond to choices alreadg nlade n the re, r constructed source language verbalizatir~n. ,:he necl.ssity of' reference to the sollrcc lm~ja59 verbalizat,ion for creatlvt$ cho~ccs at many rmints 1s suq6:estitd in F'iqure 1 the zir;za~ arrows lrie believe that- 'thir, r~icture provides il p1nu:;lblr. 1)a::ar; Tor translation re!:(?arch, but nf:erll~~ss to my ~t ,)resents rnv!nv prnblc~nr, whose solutions are only dimly Yoreseen at the p?c:;ent tlmc. Uur project concentrated mor of its attqntion an verbnlizntion itself than on parsing or tpmslation, slnce both of the latter depend on a prior understanding of verbalization. Any other ordermi- of priorities would be putt in^ the cart before the horse. Any detailed investigation of the parsing comnonent wolild be futzle lf we dld not know what sort of output we would expect thnt con:~onent to ~roduce: target language Figure 1 the proccnnes thnt went into EI p?rt.iairlnr verbalizatinn. The kr-nn8lation comannent - is R ve~b~li~ation,, thpr1p;h one of R sneciaL sort, and there a~ain a detailed understmrlinp; of verbnlizntion processes is necezsary. This report, then, will be most cr)ncerned wi.th the nature of verbalization. We will also devote considera7~le space to the nature of that speci-al sort of verba1iz:rtion which in trnnslation. 'We will have the least to say about parsinc. Examples will be cited from English and Japanese.</Paragraph> <Paragraph position="8"> For bout the last nine months of the project we were concerned with the development of :m intxrnct ive computer pro,p;ram thnt would implement the verbalization nrocesses we hy-potheslzed. f~lthoup-h ti prQy;ram remained primitive, the intention was that it would ~raduall~ achieve increased sophistication in its abilltg to simulate verb: lization, translation, and garsing. As it presently simulates the Drocesses of verbalization, it beeins with an item that represents the initial holistic idea which the sneaker or writer of a text wishes o c~)nmunicate. It then asks the user, seated at a tele$tyne, to make the series of creative choices that are hecessnry kn the production of the fanal text. lit the same time it attempts to anilly on its own the al~orlfhmic processes w%ich a-e called for. It knows when cre:ltlve choices are necessary, but must ask the user what choices to make. Ideally it shol~ld be able to anply the aleorithmic processes wlthout help. As it simulates translation it should likewise be able to apply the algorithmic Drocesses of the targt:t lan~uage automatically, and also to apply certain creative processes on its own by looking at the source 1onp;uaf;e vnrbalizatit~n to see wnat creative choices were made there. hhenever j.t is not able to make a creative choicb, the prop;r:un asks the user to do so. e find that this kind of machine-user inter &ion wovides a valuable research technique.</Paragraph> <Paragraph position="9"> Taking as oui- ultimate god the eventual eliminnti on of the user from the translation Rrogram altogether, we start with a situation in which the u6t.r fntervenes at many points. As we learn more we can graaually give the machlne mope to do and tne user less. This technique can be followed not only in verbalization, but also in parsing 'ulhetner r;he user will eventually dis sppear from the ~icture altogether is uncertain.</Paragraph> <Paragraph position="10"> However that nay be, the goal a1 a pro.;ram in which the conti-ibution of the user is significantly diminished in relation to that of the nachine seems worsable. Short of the final goal of eliminating tne user altogether, an intermediate goal identifiable as 'human-iided&quot; machine translation can more easily be foreseen.</Paragraph> <Paragraph position="11"> Here the machine will do the many things for which it is suited; but a human brain will be introduced =at those points where the machine has reached its limits. This intermediate goal has, we believe, significant p-~actical as well as theoretical value.</Paragraph> <Paragraph position="12"> Funding for this project ceased in June 1974. The report mubt be read, therefore, as a s:mmary of work thnt was interrupted in mid-course, and-as a partial blueprint Tor further work should the necessary funding ever materialize. At this point, six months after the termination of the project, the need for varlbus modlflcations is already evident. It seems best, howeven, to document consistently how things stood at the time of interruption, without trying to i+ntroduce now and untested material.</Paragraph> <Paragraph position="13"> 11, Subconcept ualizhtion nle assume that a speuer or writer begins with a sin~le, unita~yj holistia concentual chunk that he has recalled from memory and has decided, for some reason to communicate. Thus he may nave ir mind some incident in which he was involved, something of interest he was previously told about or read about, some experiment he wishes to report on, or whatever. de label such a chU, as well as the smnllmer chunks into which it will be analyzed, with tlie prefix CC (for &quot;conceptual chunk&quot;) followed by a fourd igif uber. he first digit indicates the lanrruwe in which verbalization is to take place (&quot;1&quot; for English and &quot;2&quot; for Japanese), and the remaiaing three digits constituts an arbitrary index--for the particu-lar chunk. 'fhl*s -e%1001 might be the name given to some p&rticular chunk of this sort that is about to be verbalized in xnglish.</Paragraph> <Paragraph position="14"> We assume. futhermore, that while this chunk is from one point of view a wit, from ahother point of view it has a more or less rich contentn, <aIrd that 1.C is tl-L~S content which t71e spsaker.wishes to convey to his audience. Sometimes, though not in most cases, the initial chunk itself may have a linguistic label. If it is a folktale, for example, it may have a name like &quot;Cihderella&quot; or 'lThe Three Bears&quot;. But someone who has decided to tell a story is not likely to say ju'st &quot;Cinderella&quot; and let it go at that. (One is reminded of the old story about a convention of comedians at which people said thirigs like &quot;h9&quot; OF &quot;178&quot; and elicited laughter aach time because everyone knew the jokes these numbers stood for.) Normally it is necessary j nstead for the speaker to get inside the content of this initial unit--to analyee it into smaller chunks. This kind of process can be pictured as shown in Figure 2, where the initial chunk CC-1001 has Seen, as we say, subconceptualized into chunks CZ-l002-&1nd Cd-1003. In a text of any size each of these smaller chunks will be further broken down into still smaller ones, and sp on, so that a hierarchical structure of successively smaller subconceptualizations emerges.</Paragraph> <Paragraph position="15"> Subconce~tualization belongs- to the class of verbalization processes which are creative. Normally a chunk does not automatically determine a particular subconceptual breakdown, but the speaker must creatively choose how to subconceptualize each one.</Paragraph> <Paragraph position="16"> It is useful to think of the content of each chunk--each circle in Figure 2--as if it were a mountai~ous landscape, with the most salient aspects .-tanding out in bold relief and the less salient appearing as only minor hills. kll other t?ings being equal, the more salient sople aspect of the total content is, the more likely the speaker is to express it when he subcoaceptualizes. Re is not likely to make exactly the same subconceptual breakdown each time he communicates the sane initial chunk, partly because he may judge different things 50 be salient in different contexts and pwtly because the landscape itself may change over time, the relative salience of its different ~LSD~C~S being modified in long-term memory. IJe assume that any particular subconceptualization necessarily leaves out part of the content of what is being subconceptl~r;lized, as suggest-ed by the area that lies within the l*lr~er circle but outside the two smaller circlcs in Pieurq 2.</Paragraph> <Paragraph position="17"> Subconceptuulization, that is, is necessarily a select'i-ve [~rocess. No one ever says everything he could say about what he has in mind.</Paragraph> <Paragraph position="18"> ~~bconceptualization of R parkicular chunk, say GJ-1001, produces two or more hew chunks., say CC-1002 and CC-1003. These new Chunks, furthermore, are conceivy.d of as related to each other in It some way. For example, 3;-1002 might be the reason&quot; for 2C-1003. Suppose the entire text consisted of -the sentences, &quot;I bouqht a bi-ke yesterday.</Paragraph> <Paragraph position="19"> I decided I need more exerci~e.&quot; Let us ssy that the first sentence is a verbzlization of ZC-1003 and the second sentence of CC-1002. de can say that 5s-1002 is the reason for CC-1003. de write a subcon~eptudizati~fl process of this kind in r;he following way: 1) JC-1001 S> CJ-KE'ASON (CC-1002, 32-1003) This statement says that the initial chunk, CZ-1001, is subconceptualized (S>) into the chunks CC-1002 and CX-1003, and thyt th'ese two new chunks nre related by the predicate labeled CJ-:1EAiLilN, The prefix JJ stands for &quot;conjunctionf1 (derived fron the ~rmrnatical, 7('( ( not the logical use of this tern). my rel~tion between Yd s is labeled with this prefix.</Paragraph> <Paragraph position="20"> 'VY'e use a different notation to repreyent each of the various stages in the verb-lization process. -r~ the outset, in thls example the initial chunk JC-1001 was all that was present. This initial representation, before any verbnlizati~n processes had beer] a~jplied, was siaply: 2) ca-1001 After the subconce~tualizatl~n soecified in 1) was applied, the Subconce~tu6lization processes ore +;bus 'rewrite rules, wh'ich replace one stace in a verh:jliant~on with a subsequent stage, The formnt wt ,;e to represent sdch stn(;es, as in 3), shoiJs predicates with their arguments written indented below thenl* In simulating verbalization our program ppesently aokls the usm to specify all the creatlve choices, restricting its own contribution to the application of nll;orithnic nrocgsses determined by the crammar of discourse, sentences, and words in the lanq-uage involved. ?he program is labeled VAI) (for &quot;verbslizc-L and trmslato$'j, :md we can illustrate convcrsntionr; netween VA? :~rd th.~ user identifyin@ them as V and L' respectively. The procram. b&[;ins by asking:</Paragraph> <Paragraph position="22"> Skipping several stens to illnstrat: unl;~ the ~cru~;~~ outlines of subconcepto~ aliznthY, wt? TC intovb?- ber{ just no:.r in t!:e ~leqt iqn:</Paragraph> <Paragraph position="24"> At tSlis n~int 'JI.+T will con:-tmct the representation shown In 3)s p~ol,:r.a 1s In givinz jnnswer like thct In 7) the user of th-4 assu~ed to be :oKlng e~piicit n declaian wlich a mnl s e+er would make u~?con~cisusly dn the b,~r,ls of a variety of co2plex criteria. ~e do not pretend to understrand how such 3 de-is; on is rczched; kt can at least introduce t-he decision itself into tho verbr~lizntion model at this stage.</Paragraph> <Paragraph position="25"> VAT will now apply txn al~orithmic or, an we say, syntactic &quot;l process triggered by the presence of ZJ-REASON in 3).</Paragraph> <Paragraph position="26"> lhe process applied is of a type tha) is also not clcariy understood, but We may view what we do at pre~ent a first approximation. \Ji'i't the moment VAT simply takes the twb ZCs related by 3'-REXSOIJ and orders them so that the second will be express'ed beigre the first. That is, for exam't~le, if SC-1002 is eventu:~lly going to he verbalizes as &quot;I decided I need more exercise&quot; -md 2C-1003 as &quot;I bought a b~ke yesterday&quot;, we want the two sentences to be expressed, with C2-1003 preceding CC-1002. Thus VAT will automatically change the representatiofl in 3) to the following: 'Phis kind of ~epresentation, in which no predicate is shown aoove the two CGs, indicates that they (or their eventual verbalimti1,ns) are to occur in the final text in the order. shown, with dz-1803 pre-ceding CC-1002.</Paragraph> <Paragraph position="27"> In Japaese the corresponalng syntactic process will tyoically lead to the attachnent of CJ-&quot;KAdk&quot; at the end of th6 second sentence. phus if a representation like that in 3) were produced in a Japanese vorb:ilization VAT would automatically cl~ange it to: The quotatlnn marks around indicate that this is an item which will actually appear as a word in the text. l~otation marks are used for iterho th~t have EI ~urf~ce Lexical reprentlntation. The reprefleptrition in 9) i.s deficient j.n that it flails to show that CJ-&quot;KAiiAt' will be part of tht: same sentence as CC-1002, whereas &quot;v-1003 will (or is likely to) form a different seotence.</Paragraph> <Paragraph position="28"> Sle indic~ate sentence boundaries with the notation CJ- &quot; . &quot; , 'since the period will a.ipear in the final text. Th~s fulLer vnrgil~ns of 8) and 7) are re~~rect ivelys The crentiofl ~f these :)eriods is r* hi>usekeepinp; task that rleed not be described in detayl here.</Paragraph> <Paragraph position="29"> Given a representatiar like thnt in lo), VAT wlll ::o on to ask about the subconcer~tunl-iznti1,n of fhe first dz in the ordering. ?he general princiqle foliowed here is one of &quot;de;~th firstr', in the sense that egrlier itlsms. in the text are Tom letely vc?rS ~lized bi:fore tile verbalizatl ~n of later items is belvn. his procedl~re :,robably 11ns some ~s:~chological vnl idity; th:lt is, a speCaker is li~ely to ttnnk of later parts of what he is rroing to sny only in terns of the most general chunks, while he is elaborating the earlier narts ln detnil. Only after ne has finished the verb:~lizat'ion of these earlier parts will he turn his attention-to a full verbalization of tne later ones.</Paragraph> <Paragraph position="30"> Thus, ositting varlous considerations not bq get discussed, subconceptualization procr:eds interac ki vcly Ln the following fashion: etc.</Paragraph> <Paragraph position="31"> In this fashion a su~conceptual hierarchy of any degree of complexity can be constn~cted and expressed.</Paragraph> <Paragraph position="32"> The organization of a text may not, be entirely hierarchical.</Paragraph> <Paragraph position="33"> however. Not only does a speaker break down larger chunks into I I smaller chunks--larger concepts&quot; into subconcepts; one chunk may also remind him of another, so that the organization which results may be in part conc-atenative. de have been viewing concatenation in tepms of excursions away from the main hierarchy, ad hn-ve been calling such excurshm 9ressions. In some discourse, however, there is no necessary constraint that the main hierarchy Se returned to, and the result may be a rambling text in which digression is added to digression.</Paragraph> <Paragraph position="34"> In a more tightly organized text digressions are more likely to appear as parenthetical remarks: brief Sidepaths which quickly return to the main hierarchy. We uoc the tern parentheeis for this brief and transient kind of digression.</Paragraph> <Paragraph position="35"> If subconceptual.ization be rcpresepted in terms of a tree diagram (which does not, however, provide a convenient mean$ of showing the relations between subcoqcepts, like CJ-BEASON), then digressions can be pictured as subtroes attached to the main tree at one point or another, as sur;gestod in Figure .3.</Paragraph> <Paragraph position="36"> One other important modification of the strictly hierarchical model of. subco~ceptualization results from the common occurrence of summarization. It is frequently the case in verbnlizat~on tnnt an iditid chunk will be subject to tyo ~ep~3rate hi-erarchies of sul)concepttdi~ation, one of which can be identified 3s a summary of the other. It is ch'aracteristic of :r summary th.4 its subconceptllal ization prclcesses nevw proceed beyond some relatively large chunks--cnunks which package a relatively large content. e can contrast a subconceptu~lization hierarchy which is a summary with a hierarchy which constitutes the body of the text and consists of subconceptualization processes thxt produce a lar.:cr number of chunks of smaller size.</Paragraph> <Paragraph position="37"> A surlrnary is ty-pically expressed at the beginning or end of a text; thst is, preceding or following the body. Various conventions for summaries are associated with dif'o ent genres of writing. Por example, a scientific article may begin with the eel$-conscious kind of summary that is called an abstract; a news report typically contains an opening par3graph telling wllu, whqt, where, and when; a fable is likely to end with a moral, and so on. Our program at present simylf asks, for the initial CC, whether it has an initial surnmnry (one cxprosr.ed at the be~inninpj of thc text). If the answer i~ yes it asks first for subconce~tunlizatio11 of tho summary, and moves on to ask about the body of the toxt only after the summary has been completely verbalized. nt the end of the text it asks whether there is a final summary.</Paragraph> <Paragraph position="38"> Cwativity within a discourse is likely to be limited by the genre to whlch the discourse belongs. It would a.jpear that there is a continuum ranging from mnximally storeoty-ped to mcmimnlly creative discourse. Plost stereotyped are those forms of discourse, such as rituals, in which the speaker has very little choice as to what he is going to say or how he is goinf: to say it. llith weir discourse the &quot;grammar&quot; of the genre provides many of the answers to the questions VAT would otherwise have to ask the use-r. In other words, VAT should be able to produce ritual texts with mininurr recourse to creative decisions. Kt the other extreme xre forms of discourse such os descriptions of uniauc + personal exncriences whicn have never been described before, where the speaker 1s relatively free to lake a 'reat variety of creative docislons.</Paragraph> <Paragraph position="39"> \lie believe it would be of considerable interest to incorporate into the verbalization process the constrnlnts iln;~osed by several different genres, but we have not as yet donr: this. As it now stands ow program does ask JIAT Is' '2B < GZlu?i?? as soon as it has established that a verbalization is to be performed. Possible answers that we mld like to implement in the future are, for example, S 0 'rSY 2:IOLOGY IL.TI~LE, Fk?BL3, and the like.</Paragraph> <Paragraph position="40"> rn exrunple of these procedure8 ar: anplied to a roal text can be based on the following United Press report trrken, sl ightly condensed, from the em Francisco Chroniclo of May 16, 19743 13) 1. An 11-ye:lr-old boy using a new &quot;super-glue&quot; 2. acciirenfally glued his eye shut 3. while building a model irlae, 4. and a doctor had to renpen the eye surgically.</Paragraph> <Paragraph position="41"> nike Harris said 6. he rubbed his left eye 7. after several drops of the glue squirted into it last Sunday 8. and found his eyelid would not move.</Paragraph> <Paragraph position="42"> 9. An eye surgeon debated briefly about 10. using a super glue solvent 11. but decided against it 12. for fear it might damqe;e the boy's eye.</Paragraph> <Paragraph position="43"> 13. 'Phe surgeon, who asked not to be identified, 1 finally put Plike in the operating room, 15- tri:,ined Mike ' s eyelashes, 16. thzn opened the eyelid surgically.</Paragraph> <Paragraph position="44"> 1 Mike was released from the hospital Tuesday.</Paragraph> <Paragraph position="45"> It is a-:proximately the case that each of the nunbered lines in this text expresses a terminal subconcept (see below). :ie assume that the text contains a nllmber of intermediate subconcepts as well, which need to be eltlcidated in a subcnnceptual hierarchy.</Paragraph> <Paragraph position="46"> Let UB SU~POBB th:& the wmbinntion of VAT and the uner are attempting to simulate the verbnlizatioh proces::es that went into the ~roduction of this text. For the moment we &re concerned oqly with subconceptualizati~n processes (and, associated syntactic algorithms). limy of the user's answers in the following conversation with VAT1 are intuitively based. 'lfhe success of our eventual parsing component will depend on the extent to which these intuitive answers can be predicted from the text together with whatever items of background knowledge we relevant. ihe example will be carried only far enough to sup;gest the nntyre of the procedure.</Paragraph> <Paragraph position="47"> n he exchanre be6ins in the usual way: VAT creates the follo~ing representat ion, including a text-f ins1 period: VAT's next question seeks to establish what genre constraints apply in this text:</Paragraph> <Paragraph position="49"> VAT will now assume that the text is a typical ncws re~ort which begirls with a sll:runar:r. Its first questions wilL deal wlth the subconce~tua~ization of the smary (expressed in the text in</Paragraph> <Paragraph position="51"> the user has answered that ;he fi:*st breakdown of the summary i@ into two subconcepts; CC-1002 (to be expressed as &quot;fh 11-year-UL~ boy using a new I'm* zlue&quot; Bccitientally clued his eye shut while building a model irplane&quot;) and CC-1003 (to be expressed as &quot;a doctor had to reoTen the eye surgically&quot;). Furthermore the relation betireen these two JCs has Seen identified as one laSzled YILLD, in which the first ZC &quot;leads to It or '\rsstGLts dn&quot; the second. YIELD di ffers from another, similar relation which is lsbelmd CAU65 in that the event conceptualized by the secmd CC is not a necessary consequence of the first. -It is, however, something that presumablp</Paragraph> <Paragraph position="53"> would not have happened if the event conceptualized by the flrst J~ had not taen plhce. (Zvidentls YI3L3 can be equated with I~<ITIAIZ as this term is used by Humelhart 1974, the relationship between an external event and the willful reaction of an ;mthropomorohized being to that event. 3chank 1974 uses 1:JLTIAICE differently.) As a result of the user's answ-?r in 17) VAT first creates the -.-epresent at ion:</Paragraph> </Section> <Section position="3" start_page="11" end_page="11" type="metho"> <SectionTitle> C J-YIELD </SectionTitle> <Paragraph position="0"> and immediately apr.lies syntactic Qrocesses which Changes it to: 'Fhnt is, the two 32s are to he expressed with tte &quot;pielderl' preceding the &quot;yielded&quot;, and they are to be connected with coma followed by the word &quot;LJD&quot;.</Paragraph> <Paragraph position="1"> This is gat the only way which YIXLD can be realized, but for the ,sake of the example we may re gard it as such. VAT will now proceed to ask a out the subconceptuali~ation of the earliest CC in 19): 20) V: WOW IS CC-1002 CUDGONCISlJTOALIZED IN TI143 C\JMMkd?Y? The user has answered that CC-1002 is broken down into t,wo CGs, CC-1004 (&quot;buildir~g a model airplane&quot;) and CC-1005 (&quot;An 11-yearold boy using a new &quot;super-glue&quot; accidently glued his eye shut&quot;). They are related by PKAMSU, a temporal relation in which Tie first CC occupies s time period larger than rind including the tirne period of the second. In oth9r words the time period of 3C-1004 includes that of CC-1005. VAT creates, sequentially, th: following two representations: Although there may be several possibilities forq the e~pressiop ~f Fm, viLT has assumed h~t: that two factors ape involved: an oraering of the two CZs so that the &quot;framertt precedes the &quot;framed&quot;, and a prefixing of the word &quot;LfiIL..~&quot; to the first 122. (In this [l /Y example the ordering Of these two AS will be reversed in a s~bsequent operlation.) If PIAME may be expressed in other ways, wt: assume (gratuitously, for the moment) that subtle :onceptual dif ferences are involved; that there is not, in bther words, free variation among possible syntactic algorithms. This remains for low an article of faith.</Paragraph> <Paragraph position="2"> We would expect VAT to ask next about the subconceptllaliz~tion )f CC-1004, but by a meand not yet discussed VAT will discover that is is a terminal CC (one not further suoconceptualized). If</Paragraph> </Section> <Section position="4" start_page="11" end_page="11" type="metho"> <SectionTitle> I1 It </SectionTitle> <Paragraph position="0"> I' AND&quot;, VAT would proceed to :C-1004 were followed by . or by , n ~sk questions directed at the comalete verbalization of this uC. &quot;super-qlue&quot;&quot;) occupies a time period which includes 1007 (&quot;hccidently glued his eye shut&quot;). So f~r we w uld expect tV1i.s second instance of PRfiE to be expresrxd by prefixing the word &quot;liilllL.<&quot; to 25-1006, as was done in 22). Let us suppose, how-t>~rer 9 h:.lt I~':tAPiL actually triggers a more complex algorithm which says in effect that one &quot;WHILE&quot; in a sentence is enough, and that a sec-~nd instanc of PHJWIE will lead to a different ex~ression. Here the second 1nsr;ance leads to the creation of a relative clause which will modify one of the conetituents of CC-1007. Furthemore, the alre~dy created &quot;WHILE&quot; clause will be moved to a position aftv ':G-lOO7. (This orderine; of the CCs does appear to maximally natural. It would be slightly less desirable, for example, tn produce &quot;While he tl WR~ building a model airplane an ll-,yenr-old boy, using a new supeI glue&quot; , eye shut.</Paragraph> <Paragraph position="1"> Certainly, the differences in thia area are very subtle.) We will i.ndiLate the relative clause status of CC-1006, to be embedded wiatkln the expression with slash notation: The representation in 25) will be discove-red to be the final one in the subconceptualization of the summm, which h~s been found to co'nsist of four CCs (ultimately four clauses) joined together in the manner indicatild. VAT will now proceed to verbalize the summary comoletely, making use of othen kinds of processes. Wnen that has been done, it will ssy: SUB&quot;,NCEPTUALIZi3D? U: YIELD (GC~O~P, GC-1003) This is, of course, the same answer that wag :;iven to the corresponding questlon in 17). above AS GC-1.002 and JC-1003 are further elaborated, however, rnany dif fe~erice~ will ertlerge. Ult iniat ely CC-1002, Phich was expressed in sentences 1-3 of the Summarv- will be expressed in the bna;~ of the Bext in sentences 5-8. CC-lOoj, expressed in the summary -3s sentence i+, will be expressed in the body in sentences 9-14.</Paragraph> <Paragraph position="2"> Wb will not repeat here- the ~per~tions involved in. the subconcepr;ualizati~n of the bpdy of the text. They ape for the most pert similar to those ill:^^ trated above.</Paragraph> <Paragraph position="3"> Variqus other relntion~ Setween ':Js .we in ;roduce(i: for exam de, that 'letween CG-1015 ( M eye surgeon debated brlcfby i~bout uqiny; a super glue colvent but decided :~~alnat it for fear it rnicht dnmal:e tne boy's eye.#> and 52-1016 (&quot;The surgeon. who asked not to be identified, Tinally put RZi3 tn operfttiw fom, trkmed Mike's eyelr?shes, then opdfied /Y 1 the eyelid sur~ically.&quot;) The first of th'ece involves an alternative that is rejected in favor of the alternative conceptualized in the second; thus, the relation rnq ,e labeled Ii:I:JE3::3-ILI-PAVOR-Ol' 'n'ithin ZC-1015 there is a relatim of 3 ;;N? td;;L;IOI'I (denial of expeetation) becwsen 32-1017 (&quot;~n eye susKeon debnted 'brief lk about using a suner glue solvent&quot;') and 22-1018 (&quot;decided ag.ii.nst it for II ' fear it night dam?ge the boy's'eye. - It will be of ems dcrable intereat to isolate relations of this sort in a variety of texts, an:! to deter.zin8 the ways in whic-h $hgy :ia;y- lje expres:;ed ~nder uaryin~ circa~stances in different 1a.ngunp;es.</Paragraph> <Paragraph position="4"> ?he text does cr~ritain one exm*de of a parenthesis, exnresseo in tho cmrestrictiye relative clause ln, lifle 13 (&quot;The surgeon, wbo asked not to be identified, &quot;). The fact thlt the surgeon asked not to be identified is a n vlor li(s,rerision from the inalnstrearn of the acc~:lat.</Paragraph> <Paragraph position="5"> it is attached to thy node representme. ;he surf;eon which ~~11 becon- a const:tuent or :;-1022 (&quot;finrilly put !like in the o :eratin& rooa, tri.med Mike ' s eyelsshes, then ooenea the eyelid</Paragraph> <Paragraph position="7"> sur;icnllg. , IV. Lexical,izat~on of a 33C he use the term lexicslizatldn to r-for to another rna.jor :omponent of vsrbnlizati:~nz sl~e.:ifically to n clu~ter of procwmes that are involved in the choico of a pnrticular linguietic expres-</Paragraph> </Section> <Section position="5" start_page="11" end_page="17" type="metho"> <SectionTitle> 3 * </SectionTitle> <Paragraph position="0"> sion for R vu. aSubconceptualization breako down an initial chunk into smaller chunks. Phese smaller chunks, however, remain oncelltual dn nature, ~lnd other ooeroti jns are nececsary to convert then into surgace linppistic reprc~sent litions. iiou6hl.v c~eaking, 1sxic~1ization involves the choice of &quot;words&quot; thirt will aonropriiita1.y commupicnte the content of 2%.</Paragraph> <Paragraph position="1"> Lsx~calization of a ZC takes ?laces ar; the noint where the SD-~&R~ decides th~t he hrrn subconcc~~tunlized r no. The air.v of subconce.~:tunlization is to 3roduce chunks of ;I size anuro?riate to ling ~lstic expression, and nnrticularly to linf-l~istlc ex:~ression that will convey neither too little or too nuch informanon to the ;iddr::scee. Too little informstlon is, for exmnle, ~rovlded 'by o sun::iaT;y, whx-e ;;ubr.nncel;tu4',izati Jn has rjroc~edeci only to a poi:& wht: e Lexicalization w-LS1 give the a drvr-see a It gener'il idea&quot; of the content of the whole. At the otwr end of the scale, we are a11 C-miliw with e .:~o.Sitionz in dhich t~o fl~i~ch informat~on is conveyed, vhnyue we :ire toll] mre h:ln NB w~nt to knqw. he asnect of a ~neakczr's cr~vitivity, then, ic? tb decide exactly wilere in the procr,m of sub^ ~nce;,t~~~l~z:~ticm he sh ~ld sto~, tnkin~ into acccblnt the rleeds a::d interests of the sddressee. It is at this :~oi:iQ that he turns to lexic~ilization.</Paragraph> <Paragraph position="2"> The s~eaker mag 31.0 be influenced in such dscisionc hy the resources hls laguage ~r&es civail:~ble for pxkatrin~ erAiLulk:: oS d~fferent sizes. Zoon?ider, for exarnle, the amo ~nt of content that 1s packqed In an English sentence like &quot;IIe hit into a double :)lay. r t If our lanf;~iap;e did cot provide this pnrticualr exprehsion, we would have to subcnncentualize this chgnk considerably further and come up with chunks that wo~ld have to be expres::ed in some such way as &quot;He hit the ball to the shortrtop, who threw it to the second baseman before the runner previously on EUirst base could reach second. ?he second ba6eman then threw the bllL to the first base. null man before the batter could reach first. J-hus his hit caused two outs to be made.&quot; rresumably a language makes available packaging at var ous leves of s~bconceptualizCtion according to predominant communicative needs within the ~ulture of its sqeakers.</Paragraph> <Paragraph position="3"> How are col~~entual chqnks communicated? One way to approach this question is by looktng :it the spatial and temporal properties of such chunks. chunk is ty~ically either event (&quot;He rubbed his left eye&quot;) or 2 situation (&quot;The glue was next to the lampu) 20th events and situations have a particulsr 1uc;us i2 sp::ce and time (the difference being that an event involves sorle spatial change throu~h time, whereas a situation does not). wch chunks.</Paragraph> <Paragraph position="4"> then&quot;, can be reqwded as assignable to particular coordinates In both a s~atial and a temporal continuum. (de omit consider tion here of generic chunks, expressed -in #sentences like &quot;Dof:s chase catsIr or &quot;The hf~usa had two chimneys&quot;, where -p:~rtlc llarity is @sent Genericness calls or extended discussion that wuld take us too fzr afield af tnis point.) If we assume that aost of the chunks a speaker wants to find linguistic expression for are evenzs or situ .tions, ad thus hsve both spatial and temporal parti~ularity, it is not r;ur;~riisin,~ that langu,i~;e falls to provide direct l8bels for them.</Paragraph> <Paragraph position="5"> .ie cmnot, in the course of nubconceptualizntiqn, arrive nt aomethin~ like CC-1011 then remember thnt the name for this chunk is &quot;BLUi4GH, md comdunicat(? it by uftk~ing th% aforcl. Pnrtic~lIiiP events :md, situatiohs me too numerous, and our experience of them too idiospcr~tic forl eacih to have its own nme. 'lhe way this probhem- is solved is threpp;h the interpretqtion ~gf many gf ferent ZCs as inst:~nces of the same cate~a.</Paragraph> <Paragraph position="6"> Thus the titr!e lqst December when I cave iny mot;h:r a Ohristmas {)resent, the time when the rnailrnaq r;clvs r:le a rezistored letter ths morning, the time yesterdqy when the teachw cave ny son R note to take hone, etc. etc. 7re 111 catee;orizab~e ns LnctTmce of &quot;~iv jng&quot; . .!e label &quot;he cate~ory itself U':-&quot;GIVI~&quot; (U3 gtgndinc for &quot;univors91 ~ntezory&quot;) md snezify the choice of thls category by the s esker with the notation:</Paragraph> <Paragraph position="8"> stance of the caterbry UG- ULV!; . It should be noted that the English w0i.d &quot;GIVE&quot; is not 8he name of this catek-ory; mther any particular $C hich is so c-.t-;::-oriz?d can be communicsted with the word &quot;GIVE&quot;. In obher words, the decinion described in 27) allows us to US+ It(;' 7J,$ll a a n:me for 23-1053.</Paragraph> <Paragraph position="9"> The way in which a speaker djecldes thnt a particular J2 can be categorized as an lnstance of so:ne LJC is of c mrse a fl~ndi2n.mtal psychological quest on. dne thing that seems clear is t znt some Xs are more easily categorized khan others; ease of catecorizabilitg has been called &quot;codability&quot; (~rorm and Lenneberq 1). in a closer approximation to huwn aent nl mocesses, therefore, a statement like 27) ouf;'f&, to be qualified as valld to a certain degree, and not ns an ~31-or-nothing decision.</Paragraph> <Paragraph position="10"> 5f the degree to which a particular CS is an ingtance of some UO ic very high-- if the CG is highly codable--then the use of the word nrovided by the U3 will succeed quite well in conveying t%e content which the a eaker has in mind. If, on the other hand, the content of the :C is not wry well c~ptured by nsgigning it to the UC, then the speaker is likely to *.lnnt to add one or more modifiers to mold the content aore c1osel.y to the content bf the CC he has in mind Adverbs are</Paragraph> </Section> <Section position="6" start_page="17" end_page="19" type="metho"> <SectionTitle> 0 n </SectionTitle> <Paragraph position="0"> an obvious d?vice by which such molding is accomplished. ~hun, the spvker might depide that the content of %-lo53 is better captured in an intersection of VS-&quot;GIVZ&quot; and UC-&quot;GR'JDSING&quot;: 28) 22-1053 3> ?JS-&quot;GIVG&quot; d. UZ-~G~{UDSISG~ in. which case the eventual lexioalization will be &quot;give grudginglytr, and not sinply &quot;give&quot;.</Paragraph> <Paragraph position="1"> Suppose= u&quot;Z-1053 is a concentual chunk that will eventually be vzrbalized with the sentence: Krs. Brown gave Tomny a cookie.</Paragraph> <Paragraph position="2"> 'de h&e sad that the wrd &quot;GIm&quot; is available as a la5el for this CC. Up to a point that is correct; there was a ~ivinc whlch took place. But sentence 28) contsins more than the word &quot;G1VE&quot; l dhat kind of conceptual information is conveyed by &quot;YiIS. Bl~OWNtt, ttili;rq:Y1l 9 and &quot;A COOKIS&quot;? Zach of xhese items evj dently conmunicates a conceat that is different in nature from a 2C. 'rLis other kind of zoncept we label a PI (for &quot;particular individual&quot;). The chief difference between a PI and a Gz seems to have to ao with temporal psrticularity. A CC is conceived of as odcupying a specific and usually fairly limited perlod of tine. The time perlod oxupied b:y, say, 8. Brown is much less ~r~ocific, and ia not likely to be nomething we are vsry interested in when we utter a pentence like 28) In other worrls, although a~Pl may have temporal particu1;lrit;y in the nense of a lifespan or total time of existence, such R time period tends to be of a different order of magnitude from that occupied by a 3C, and more often than not is of little relevance when the PI is communicated. Furthermore, any one 1 may participate in an indeterminate rlumber of different 03s. (Mrs. Brown has done many other things besides that which was reported in h'hy do PIS play a necessary role in the communication of a CG2 'Phe answer may have something to do wSth the necessity for providing gnchor points in the addressee's mind. Because of its lack of temporal psrticularity, the concept of a PI is a relatively stzible concept, ana one which is liable +a enter consclousn-:ss again and again 'dth respect to a wlde vririety of 3:s. Thus, the only bray s s:)e.lker cm effectively install the content of a JC in the addressee s mind is to tie it to one or more PIS alrcady known to the addwssee. That iq, the ununl way LII communic~tin~ information is by brin(prig one or #\ore PI nodes into the addressee's conscioushess, and by predicating 3omethi.n~ of these nodes.</Paragraph> <Paragraph position="3"> -.~an wage usually involves takin;; one PI (the &quot;topic&quot;) as a starting point and either predicating sonething of it ?lone, or tying i to other+ I Is through a relational ~redicate.</Paragraph> <Paragraph position="4"> It should be noted in passing that nor everything which 1s expressed syntactically as a noun is conceptually a 21. A rrr~~d like &quot;Tues~ay&quot; for exam Le, may be used as the nrme for what we. call R PC: a &quot;partlculnr tlme&quot; wh~ch rnlght be wed to provlde temporal orlentntlon in a s~ntence like &quot;On Tuesday Tlrs. Brown gave Tommy a cookle.</Paragraph> <Paragraph position="6"> In decldlng to catepw~ze a ln a csrtnln way, sag -s an lnstance of U;-'Tu\lE&quot; 9 a sneaker sl.nult ineously est9bl lshes a framework of PIS vlrhlch a-e separated out froq the cohtcnt of the and way. In the case of U3-&quot;GAVEo' these 1 wlll function as went -j beneficiary, and atl lent (the ve thy glvee, md the g~ven). The fact that :hew three Is ire entalled by the choice o! U:-&quot;dJY IS expressed as follows: The letters A, B, 2, and D in thls st tetnent are variables ranglng over part~cular four dlglt nuqbers. For exaxle, X-h mlght be &quot;v-1053, PI-9 alght be PI-1687, etc. The syhbol > is to be read &quot;entails&quot;, and-F> 1s to be read &quot;1s framed as . (The nota- null tlon to ?he rltht of F> can be re,.- ]r+ed as a case frmeM; hence the mproprlnteness of the te~~ &quot;framlnc&quot;. ) The statement in 29), then, s~ys th?t iillen o le has chosen declslon entalls that tLe 33 m11 be frmed as, or exyrec,sed by, the verb (Ydl &quot;GIVdl' ahcoa?anled b three r&quot;s, functlonln; 2s ~ent, beneficiary, and aatlent. at~tements ilk- that in 29) are stored in our angllsh lexlcon. Thl? s5atenent actually forms only part of the lexical entry for JJ- 11~1 J,JI,&quot;.</Paragraph> <Paragraph position="7"> The c3n3let- entry for ths cltegory contams a nmber of addltlonal lznes whlch state vf+rious othcr entnilments, for exnrnole that ~pviny; involves transfcr of ownership. These othw ns:)ects of lexical entries will be discussed below.</Paragraph> <Paragraph position="8"> To summarize, a 3C of the a propriato size, nrrsvcd at th'rouct subcnncer~tualization, will De subject to cnte~:orization in terms of some UC, the off'ect of which will, be to brcate, h;~ way of the lcxicon, a vecb~ll label for the 3!: tof:ether with a ffr~ework of associ~.ted nouns. The framing operfitlon, in cf fect , will have factored out those elements (PIS) having no significant ternpor~il p~rticulnrity, leaving a word (the Vn) to which 21 one that tem:,orol particularity will be assigned.</Paragraph> <Paragraph position="9"> It is probably a consequence of its being left with this temporal role that the V.3 is likely to end up carrying a temr,oral marker of some kind, such as a tmse and or 3r;pect. sufflx. If, forhexrmple, the :C occur;ie? a temnornl locus that recedes the locus of the speech act, the Vi3 js likely to end up with a past tense suffix attached. ?his part of rli?xlcalizotl~n we -a11 inflection. Its im~lement ation will be i1.-lustrated immxcdi .~t ly below.</Paragraph> <Paragraph position="10"> Our nro6;r:lrn tric:; to e:;t:~blish at -the o~tsct for each J., whether it can be c I! e,r;o~ized, or1 the ;:..;,n,~~q:)tlon th'it- the s ;criker is aiming at DUC~ cateRorization as :I kzoal, rmd that suhconce:~tuslization takes place only when the c~ntent of the :C 1s such thrit categorization is not appropriate. 2hus the flrst question asked of any 2C is of th-. sort: 30) V: GPL! GC-lO53 BE G' ,TZGOI!I:XDZ Y'i If the user's answer is no, ir~3 ;aes on- to .~sk how this dd 1s to be subconceptualized, QS in the example given in sec Lion 111. If, on the other hand, the user's answer is yes, VAf will (TO on to ask cluestion &quot;c1lev nt to th:? tense/aspoct properties of the Uv. At prcsent it. asks fipst:</Paragraph> <Paragraph position="12"> since special considerations have to be give to CCs that do not have temooral particularity. If the answer to 31) is no, Vdi' presently asswne.s by default that CJ-1053 has a temporal locus preceding that of the speech act. 'Phis is certainly the mdst probable state of affairs for most kinds of discourse. Be would like event~ally to. elaborat~: other ~ossibllities, which aro likely to 6epend on adverbial and other means of establishing tempopol oar,tlcularity. Our nrogram at :)resent will, .under these circumsta~ces, add the inflectional notation &quot;PAGT&quot; after a slash, as in: 32) -22-1053 / lti3AST&quot; It is now time for the foll~wing exchange: '1 1 The user says that i;he decision has been to cate~-orize thls dw :is an instance of tho category UJ-I1GlVZ&quot;. VAT than looks into the lexicon and, on the basis of the last line in 29), r+places 32) with: Two other consideration:: are relevant at this palnt.</Paragraph> <Paragraph position="13"> For one thing, VAT will w.ant to replace the 21 variables in 34) with partlcular four digit numbers.</Paragraph> <Paragraph position="14"> Our easiest recourse at present is to have VAT ask the user about cxh iJL: whereupon VAT will. re r~lace 3 wl.th: kt least some of the answers to the questions in 3') ought, under soma circumstances, to be derivable from the context. :le hope gradually to teach VAT to discover such aswers for itself whenever nossible.</Paragraph> <Paragraph position="15"> A second consider ltlon at this point 5s to estab! ish which PI is the subject or topic, the PI on which the sr>eaker lntends the addressee' a attention to be focusod and concerning -{/hich something will be asserted. Again the easy way out 1s for 'JAiT to :I& the user: 'n Jl ;CT'i 37) V: ~JiIiil! 1,: ,111 L,, U: 11-1254 The question in 37) 1,s :q)pro~~riate for a nub,ject-prornl-nent 1anp;ua -Q like English. If thq verbalization is In a topic-prominent language V~LT will -:,sk instead abdut the topic ('Li 1974). In Enr~lish this may be the ~oint at which functloonal relations such as af;ent, beneficiary, and patient shoilld be rer~laced .& by surf ,ice syntactic roles like subject, lndifect object, and direct object. (In o kand ni Jap~ne~e the intooduction of pa-ficles -ike E, -, would be appropriate here. ) Thqs, after 3 VA'P rn* chan~e the representation in 36) to:</Paragraph> <Paragraph position="17"> where I0 and DO stand for %ndirect objeCtl' and '*direct oojecr;&quot;.</Paragraph> <Paragraph position="18"> Again, the identity of the topic will often be deriv;lble from the* context. For example, $11 other th-~ngs being equal, topics have a tendency to remain constant from one clzuse to the next, arpnts are mom likely to be topics than patients, and so on.. Gonsiderable empirical work will be necessary before all such factors hatre Seen sorted out.</Paragraph> <Paragraph position="19"> If the codability of 3C-1053 had been somewhat Iwcr and the modified categorization exemplified in 28) had been chosen, the representahon at: t:lis stage woc~ld include an advsrh (AV): The lexicnlization of C3-1053, then, has involved cate!-orizw tion, possibly modification, inflection, ;md framing. The next step in verbalization is to lexicalize- the sevrral 1'1s whizh rve contained in a reopesentation like 78) or 39). We will see that the lexicalization of a Fr involves categori~atic~n, possibly modification, and inflection.</Paragraph> <Paragraph position="20"> Yrarnlng is for the most part restricted to the lexicaslizatl:~n of a CG.</Paragraph> <Paragraph position="21"> A PI is the concept of n concrete object, be it animate of inanimate, or of 3n abstraction which has been reif ied and is being treated linguistically in w.&ys ~~~O~~OIIS to the treatment of physical objects. The supface linguistic representamon of a PI may be a proper noun, a conmon noun, a pronoun, or nothing at all. Further-more, blr agreement processes certain features of the PI rnay be incorporated into the verb with w9ich it 1s associated. .hch language has its own idiosyncrasies in he treatment of PIS. Some, like Japanese, arie especially fond of deletiny: the PI a1toc;ether whepevdr it is predictable from context. Sone, of the polysynthetic type, seen to go overboard in the extent bo which they incorporate. features of the noun within the verb. Saxe .nc&e a mint sf adding inflectional features expressiny; &quot;definiteness&quot;, plurality, and the like to the surface noun, while others seem to qet along well1 without such expression. For illustrative -1urpose8 we will canllne oxrselves in thls section -t;o the rnain outlines bf how a :'I is lexicalized in English.</Paragraph> <Paragraph position="22"> - Much depends on whether 'dr not the PI in question is glvenl'-- .</Paragraph> <Paragraph position="24"> aipeady hean brought into the addressee ' s consciousn~ss in sone way; nrio-r ti, the I ttering of' the present sentence (3hafe 1974).</Paragraph> <Paragraph position="25"> Here =aiA we h5ve a case where the easiest, course for VAT at this preliminary sts:-e of jts developnent is to nsk 'the uscr:</Paragraph> <Paragraph position="27"> Certainly in many d41scs, howev:;r, ViiT I. can Se t:l:rght to decide this r for itself. If, for example, 11-12311. was ~nentioned in the preceding sentence th~! answer to 0 must be yes.</Paragraph> <Paragraph position="28"> If the preceding Bentence was &quot;Mrs. Brown cme over from next door!' and we are c.)ncerned with the lexicalization of PI-1234 wlthin the sentence &quot;PI-1231 I;ave Tommy a cookie&quot;, FPre g-iyexxess of PI-1234 will result in its lexicalization as IISHEII . \Je can actually go a fair distance in establishing the givemess of a PI on this Baais alone, but the question ~i' how else givenness is estabLlahed, including its introduction from knowledge external to the linguistic text altogether, cdls for extensive further work.</Paragraph> <Paragraph position="29"> Let us assuae first that the answer to LO) has been yes, in which case English is likely to lexicalize PI-1234 with a or0noun. This is not always the case; sometimes a PI that is given will not be pron~minallzed. The principal criterion here se~ms to be whether pronominalization will produce mbit;uitg, and ultimately VAT -will need to deci6e whether ambiguity will result. For now, however, we proceed on the assumation that a PI which f3 ajvcn will automatically be pronominalized.</Paragraph> <Paragraph position="30"> The procedure we are currently using for prononinalization in English asks first:</Paragraph> <Paragraph position="32"> T:lis question is asked first because the pronoun &quot;YOU&quot; does not distinguish nuniber, and if the answer to 41) is yes bt will not be necessary for VAT to do anyth3:ng beyand lexicalizing PI-123'1 as NN-&quot;YOU&quot;' (N~J, of course, for &quot;noun&quot;).</Paragraph> <Paragraph position="33"> If., on the other h;;nd, the answer to 1) fs no, then VAT must ask: 42'9 V: WIIAT' IS TTIO Z/\IIDI&quot;tlAlrTTY OIP PI-12341 1Je assume that. a LJI i~ from one point of view the concept of s set of objecbs, md that me cardinality of the set is relevant In establishing expressions of singularity and plurality, among other things. Actually the d.istinction between one and more than one as possible answers to 42) is all that is relevant at the moment. More interesting questions do arise-in this area. For examnie, with cardinalities up to ahout five there is likely to be a need for distihguis%ing each member of th? set with a specific PI number, ~~vheress with lrlrger cardinalities the set is likely to be conceived of sin )ly as containing &quot;a n~unber of'! or &quot;many&quot; members. If we assume first that the answ8:r to f+2) is one, then 'JA'P will</Paragraph> <Paragraph position="35"> This classif lc;ltif3n includes human beings, but also nadled animals such as pets. If the answhr to 4) is no, VAT will lexicalize PI-1234 as NN-&quot;I'P&quot;. Otherwise it rnust find the sex of this refere~lt:</Paragraph> <Paragraph position="37"> md lexicalize it as IJTd-&quot;HiS&quot; or EN-&quot;6:IZ&quot; accordingly.</Paragraph> <Paragraph position="38"> If the mswr to 42) was a n:lnber greater than one, VAT must decidz between &quot;~~~&quot; and &quot;TilEY&quot;, the pronouns 7,dhich are explicitly plural. d~sefitially it must ask: '1.6) V: It *,&quot;1!3 Lj .;'Mi ,11 h IuIJ?: JE!t OP .171-123LC2 If yes, it will q~rotlr~ce tho 1exitbnlization AN- ''5IE&quot; ~nd if no, ;3,-y' lsytt ?here are +gain. n vgriety of bays in lihhich V!iT' might be able to answer quastions like 111) throuph 46) without i~skinc tk1c ~lscr, i&entitv of n~~cxkcr md addressee will hwe been established bp p~ovidinp: ruch discci:rse paymeters at the very beg~nning of the discgvlrse; at resent we use the arbitrary convention that PI-1001 is the saeal..nr =A 71-1002 the addressee.</Paragraph> <Paragraph position="39"> In qnesti'on 41) and /+3) T is sski?~: whether 21-1234 is identical tq iT-lf)02 or 11-1001. But, de andip'; on the context, t ILS identity may already have hen established. As for the cardl~lalitj. of 1/1-1254, it may have been iaade :xnl:cit through a ~tmeral or In sow. &her way. And the qender of thi ref or&. night h~vn Seen ent ablished th-rough the previous use of a sex-s- e-kf ic proper nam.e, or th-rough sox other fact that has alrcad j been supplied.</Paragraph> <Paragraph position="40"> Let us :;ow turn To the oossibility that 1'1-1254 is not ~iven-that t'le msw?r to question 40) T~~ilr, no. In that c?se, lexlcalization must be either i*: terns sf .i nropor name, or th~~~u:;il tno use of a cate~oriza io:: ZTII? ultirn3teIy 2 :ommpn no:;n VAT rirst ;:sks:</Paragraph> <Paragraph position="42"> If yes, th.: lJ-eT. cives the name and T/A'P lexi :alizes PI-1234 as J ar t5e like, The rzal situztion is not aulte tklip simple, since n-+rdI is likel;~ to have one tk~an one proper lime (John, Mr. !3r9wn, 2ad37, etc. the '.ch@ice of whlch, if any, mon~ theln to Iiise will de :end 02 various interp?rsonal cwisideraf ions. .Jventually our ~r3qr-IT s:lo~ld incl ~de questions relev.mt to such n c'noice. If the answer to 47) is no, then 'Vlfl follows a procedlire roupkly analogous to that nssocinted with the catet;orization of a and will look at the Lexiciil entq for thss c~tecqry for whatcivei. relevant informat;i(;>n is st;ort$d there.</Paragraph> <Paragraph position="43"> Just :as a % nay given R lexic:-ilization thik ir, infl'ect-ed for tense and/or a,spect. Ghe Isxizalizatior: of a PI may be t~ivcn ii\flections for such features as nunber and/or definiteness, If the lexicon shows, for example, thrtt UZ-&quot;TEACII ;HI' entails that PI-1234 is countaSle, 1 also in t9ls case ask about its cardinality, as in 92) above. I-f the answer is a nufiber r~reater than one, TJl~'2 will ere te n re-7resent at ion 1 l~e 21:- &quot;;&quot;&quot;:;::IiLfifI / &quot; ;>i,:T .ILL'' Tndencndent of thir: nilmber question, VAT .WI 11 need to deteminc uhe+,her the use of this c .tef.;ory in this context w:ll enabla the aedressee to know wh3t :~nrticular. inst :rif;&: of the ce tey;or;y is bei nF:bt:rtlkcd about. ,:e :?u-t thir: in tcnns of the q:lestion: 50) v: jc,:i; u.:-&quot;~:..=!iz~i&quot; 1a :::r:~ T-y ~1~~2342 If yes, VLr will ad the definite zirticle () as an inflection: it PI- \I 2 - Jli~llrJL m-vT?bj~~ 1 ;&-IIT~~~I~ If no--that LS, if the addressee is assumed not to be a5le to ide!,tifjr a 3reviogsly kaow PI as the reS.erent, 'JAT will ae $tie !~etiveen, the indefinite articles L?-&quot;A&quot; and Lii&quot;30!13&quot; de3e dinr: on wh.eth9r the cardinal-lty of PI-1254 is one or ;relater than one. The sutcome will thus be cithcr NN-&quot;T!SAZII:;H&quot; UI-&quot;AW i~l / &quot;I3LPIIAL&quot; / &Z~&quot;*L~'IIZE&quot;~ that is, &quot;a teacher&quot; )r ''sane te~ci~e~s~~. we have attem~ted to formalize some of the :ontextual grodnds on whkbh VhT will be able to answer a question Like 50) without. asking the user, and this matter will be discussed in section 'VII below.</Paragraph> <Paragraph position="44"> In all, its o~-ercitions Vll11' must at ,ne4ny points :lake access to a store of more or less pernnnent lexical know]-edge which we have formalized in tePas of efitailments of c;!tegories. The st7tements in the lexicon mecify what we know about a particular .X or 21 as a result of its being identified as ar, instance o a certain category. Or, to look at it from the ovosite point of view, these statements say what properties a p~rticular' CC Or TI must h~ve in order to he crcte[.;orized in a cerc in way. From the rlrst ~oint of view we c:m say that once we know that a-particu1.m CC has been categorized as an instance of UC-&quot;GITJE&quot;, for ex:un~le, the lexicon tells us a number of othei thin;$ that we must know about this CC. From the second point of view we chi say that the lexi~;:~l entry fo? UZ-&quot;GITfE&quot; tells us what we must know about a i: in order to assign it to this category. ?hose two Wfiys of vizwinp lexic,~l entries ?re notbin cont-adiction, but ::re dlffcrent sldes of the same coin.</Paragraph> <Paragraph position="45"> From ad osycholoqicai stanlpoint the lexissn approximates a description of everything that is involved inapeFsonls interpretation of the world, at least so far as his interpretive i rid is r',e~endent an verbill cnte~-.;oriss. We AT(? unnhlc, of courne, to f'ocua on indi.viclunl. differences, but must htternr~t to dei.11. with a core that is common to the s;)cakr?rs of a. y)articalnr 1r1n~uaf;e. The lexicon is the heart of 011s propram, whet her we re enfr;ar5ed in verhnlizat ion, crnnslatic)n, or in (and everythin~ else denends on the success with which the lexicon han beer1 elaborated. 1% sc,r,nrate lexicon 9a~ t-o be develo.,r?d for oa,ch lm~uap;e wlth dhich the !?rop;r?m~ tries to de.il. In a full-fledmd irnple:aent:itj on certninly a very high nroportion of the total develo.jdent~1 effort will.Qave to be devoted to lexical questions.</Paragraph> <Paragraph position="46"> As a slrLr)le 11lustratic-m of the kind af information +lexical entry might contain, as well :is of the for~alism we hove 5een usinq to reprec;ent such infornatnn, let us consider at least nart of what it; rneans for rs ps~tlcular .23 to be (: jte~';~rized as instance of UZ-ttLIFT&quot;t Ve will w:at to sa;~ that when X lifts Y, ti entails that % does sclr~lethiri~ which cailses a. chmir;'e of st;:.,te frgm Y be in^ ill one. locatson to H being i-n another loc2tion, and fur-bhem~re that the new loc-2tlrrn is shove the ofd location. The 1exiz:~l entry for U2-&quot;LIII'T&quot;, insof-lr as it cny~t~~rcs t;i:lr; much Infurnn-t;~.on, is written as follows:</Paragraph> <Paragraph position="48"> '?he first two lines are to be read, &quot;If ;G-A is cnt~~oFized as instance of UC-&quot;LII~T&quot;, this entails.. . &quot; The first line under .X> then ~ives the case frane, saying that there will be a clause contaming r;ne verb &quot;LIFTtt ac3:~rnpanied by an age1,t (PI-B) and a patient (PI-c).</Paragraph> <Paragraph position="49"> The second Pine under i3> says that it is alternatively possible to subconce;~tl:alize CC-A in e certain way, wnich aruou~~ts to a :)araphr,:se. That is., a1thoufl;h the sptt ak hr has - c 1oL-n not to subconce~tualize dC-A further (bresuriably because the c'loice of 1% &quot;LIFTt' has heen Judged to ~rovide the right packaining for C:-hj, if he had decided to subconceptualize f~rther he could have done it in the manner specified in thi~ line, where two new s, - and lc-2, are joined by CZ-3hUsE.- In othez words SAD is c,:nceived of as ceusing CC-L. The tiLird line under ii> sags sornetiliqg about the content of JC-I), namely ',hat it lnvolves -ah act by PI-R. (It may be noted that the absence of q~lotes around h2T in VU-ACT indicates that this is not a conceptual unit that will lead to a direct surface structure represent ition, as will VB-&quot;ZITI&quot;' . j Jhe fourth line r1-T under z> says that UL-2, which is caused by this act, cm be subconcestualized into. twoe c:mjoined el=ofie~tn. 'Phe first of these is &quot;l I T-(i 1r1- P 11 a ~hui~~ from sc-P to - md ;;he sel:ond 1s ,,,-l! he flfth and sixth lines lnder W> soecify the nature of the prior and subsequent -I il states, 32-P and d.4k 30th inv7lve PI-: being at sone location, first I .and then 1%-J (PL sttding for &quot;partticular location&quot;). The Last line eluciikates ZC-II stating that the new location (PL-J) is above the old location (P.LF1). Thus 51) hag captured formally the several bits of knowleuce ab'lut CC-A that were sl~rnmerized discursively at the beg inn in^ of this paragraph.</Paragraph> <Paragraph position="50"> Let us PQW turn to a more comnlicated exmr>le. 'Phis exImple came up initially 8s a result of the absepati~n that the Japanese verb kasu can be translated into dnp;lish as either rent (out) or lend. In other words this verb is rlonspecific as to whether the ap;ent does or does nr~t recive money for the ~c)ods OP servi~es he psov~des. LJe were interested in how a translatif>n from Jsnanese int 3 English would decide wliether to use - rent or lend where the Ja~anese had used - kasu. This problem led us to consider lexic;-dl entries for several verbs involviug transfers and t~ans:~~ ti-f)ns, 2nd we arrived at a System of cross-referencing and embedding within lexical entsie s th;;t captures the content of abstract notions (such as transfer and transaction) at the same time tha-b it links ~nt~ies one tr mothe?? in a way that 1s renerally useful.</Paragraph> <Paragraph position="51"> vle may bepn by defining a transfer. bL'e lassume a cate~7ory UC'y~l which, since 2 t does not cont ;~ln cluot atior~ mnrCks , is understood to be ambstr;i,ct ;-,nd not lrnmediately convertlhle into a surf ace st-ructune verb. The lexical entry rr>ads as follows.:</Paragraph> <Paragraph position="53"> Discursively, a CC-A which has been categorized as on instance of UC ?RiU;BFZR can alternatively be s~tbconoeptualized (or .in helms of a change from i3G-B to GO-G, whe~e the former involves PI- D &quot;having&quot;E-d, and the latter involves another party, PI-I?, having PI-6. In other words, a transfer in~rolvus a change in the having of some object (PI-3) from one indiviaual to anotTl?w The English word _I_ have of course performs a variety of semantic functions; our use of- it in this formalism is meant to include at least two varieties of hav5ng--ownership, wh ch we will label HAVE-OidN, and having the use of something, wt-ich we will call HAVE-USS. Simple HAV3, as in 52), is meant to be nonspecific a$ to wt~ich of these varieties of having is involved, as may be accounted for with the following two statements:</Paragraph> <Paragraph position="55"> That is, a CC which has been categorized as an instance of UC-&quot;GTVX:,.&quot; has tho case fr~me nhown in the firnt line undor &>. The question nark before the heneflcimy indicates that it is opt ianal; one c~n say &quot;Hoger gave n book&quot; without nentionin~ 9 beneficiary. The second line under E> shows that this Cz can also be categorized :I$ an instrmce of UC-Tt.1ANSZI'::h. is fact means that the '2 also has the ent-ilments listed in 52). Since the variables within exh 'V lexical entry are arbitrarily labeled A, B, s, otc., it i8 riecessary now to state equivi~lencr?~ between the v~rl&bles in the ontry for rn UC-ftGIV6&quot; and those in the entry for UZ-TIIR!::~YLA- ~hese oc4uivalences are listed, indented, in the last three lines of 54). They vle to be read, &quot;1'1-I> of the T;i~T3F.1i,l- entry is qulvnlent Lo i31-I3 of the ItGIVEH entry (the giver); 1'1-b' of the TBl;!JSPER entry is eqllivalent to PI-C of the &quot;(;IUEt' entry (the glvee)* aqd 11-E of the &quot;.iA.idp.ia entry is equivalent to PI-U of the &quot;GIVX&quot; entry (the given). In this way 54) and 52) are brought into the correct ali~nznent.</Paragraph> <Paragraph position="56"> Anotker, more cornllicated kind of transf r 1s that involved in the cat eaory ij3-&quot;LX:IDt'</Paragraph> <Paragraph position="58"> The first seven li~egof this nntry are entirely parallel to the entry for UG-&quot;GI-VE&quot; in 54). It then becomes necessary to refer. to the earlier ma lrlter states, X-13 :md CJ-J, of the T.~! entry.</Paragraph> <Paragraph position="59"> These are equated with JC-B and JC-F of the &quot;LE:UD1' entry.</Paragraph> <Paragraph position="60"> It is said that both of these states involve IIAVZ-JAE. ihat is, when X lends an object to Y, in the earlier state X has use of the oh~ect and Jn the late1 stste Y does. The n-xt to last line s3ys that PI-B, the awnt of the lending; maintains ownership of PI-I-) thx>uughout. Phe last line says that 3-x cannot be categorizeu 3s a transaction, as explained b?low. Svidently the only di fference between 55) ant: the entry for U:-''~AL-!' ( kasu) in Jan#!ncse is that for -the latter the last lin~ of 55) ims miss~ns. Ihus, - knsu leaves it unaecidnd whether a transaction was involved or not.</Paragraph> <Paragraph position="61"> w'hat, then, is a :ransactio~~! dnsen~lally it is a lid-cing of two trmsfsrs, where one of th? transfers iz for thc purpose of the other. In buying, for examele, a tbypicel truli;act~on. the buyer gives aoney to the sr?.llf:r so that the sellv will zive him some object in retm rl.</Paragraph> <Paragraph position="62"> 14Jith bu~ina, ,-, chanqe of o~nt?rshin in involved in both tmnsfers, but that neerl nqt Se the, csse. 'w;i-tl~ r?ntinc, for example, theze is a ch3.nr.e of awnerstli,~ of the aodey, tut only A cha.ril;e of use of the r,bject. We define amtranaact;ion as follows:</Paragraph> <Paragraph position="64"> The first line under E> states that X-.A can be paraphrased in terms of CC-I3 and CC-2, -the former being for the purvose of $he latter.</Paragraph> <Paragraph position="65"> CC-B-j $ a tr inafer in which PI-D (e.Re Dhe buyer) transfers PI-E (e.:p;. money) to PI-P (e.p;. the seller). CC-C is a transfer in which t'he roles of PI-D and PI-P (and hence their relation to the variables in 52)) are reversed. Furthermore, the object transferred (e.ge the thing bought) is a different one--here PI-G,.</Paragraph> <Paragraph position="66"> Besides buying and sglling, anather typical ~ransact~on is renting. The Xnglish word rent is ambiguous, an(: wewill illustrate here the ent~y for what we call U'3-ffR,1:;IT-2ff, whlch is renting out</Paragraph> <Paragraph position="68"> The first line under E> gives the case frame, which includes two obligatory cases, an agent and a patieilt (Bill rented (out) his lawnmover&quot;) and an optional beneficiary and measure (MLR) (&quot;-Bill rented his lawnmower to Tom for five dollars&quot;). The second line under 2> says that ZC-A is a transaction; ~t thus conforms to 56) and it is necessary to state the equivalences between the PIS in 5?) and those in 56). Below these PI equiv:llences it is also stated that the JC-B of the WANSACTION delinition (the transfer of money) is equivalent to CC-F of the 'RENT-2&quot; definition, while CZ-C of the TLCLN~ACTIOIY definition (the transf ar of the objec*) is equival'ent to Cc-G of &quot;REiTT-2&quot;. The twu bbdbes of the f.i rst TRANSFXH are named CC-H and CC-I, whlle xhe two states of the second TLLINSFER are named ZC-J and CC-K. It is then sald that the measure, PI-D, must be some-thing categorizable as a MilDIUM-OF-EXSWSJGi--nomally money, but potentially anything that would perform this function. The two states of the fir& ERfl!JXYI:H are then both said to be instances of UC-IIAVl3-OWN, since the money actually changes ownership. The two states of the second tr'msfer, on the other hand, are instances of UC-HAW-USP, since the object does not change ownership, but only use. The last line, like the next to last line of 55) says that the agent of the rentina; retains ownership of the obje'ct.</Paragraph> <Paragraph position="69"> It was mentioned that the lexical entry for Japanese TJC-J'K ,Sis the same 3s that for Zngliah U3-!IL:iXD1', as ill 55), except that the Japanese entry laCks the last line of 55) in which it is stipulated that lend in^ cannot be a transactisn, It can now be jeen thar; UC-&quot;KAS-I' is conoatible with both 55) and 57,. we thus have a formal explanation for the fact thnt kasu may be translated as either lend or rent. In order to decide between the two translations, it is necessary to searoh the context in which this CG occurs to discover whet he^ it is or is not a transaction, We will return to this matter in our discussion of trcmsl~t;ion in section VIII, Lexic 1 entries for cztegorles whose lnstmces are 1'1s are designed to elucidate t'he know1edf:e which is entailed by the assignment of a partic llnr P1 to some c.itegor-y. Such entries do not contain a ctbse frame, but .ire otherwise similar in format to the entries for categories whose ins-t;ancqs qre GCs, as described above. As a simple examnle, we nay note, that wheh a PI is categorized as ~JI instance of UG-&quot;C;LK1' th is nn entailment that t'lls PI will &quot;have&quot; a trunk. ~'hls kind of having is different from those discussed In connection with EUrczns~ers ?nd transactions in the last section; we represent it with iIiiVE-AS-PMlT:</Paragraph> <Paragraph position="71"> It is useful here. (and elsewhere in the lexicon) to distinpish between necessary entailments and e~pected entailments or default options. The latter constitute knowledge that is normally entailed by the category, but not necesswily so. We indicate entailments of this sort with a prefixed &quot;E:&quot;. As an example we m:iy note that some thing which has been categorizeti as a MEDIm-OF-XXCLIiLYGE (cf .</Paragraph> <Paragraph position="72"> 57)) is normally expected to be money, althou[~;l- -In some circumstances it might be cowry shells or wampum:</Paragraph> <Paragraph position="74"> A more com2lex example involves the categorization of a PI as an instance of UC-&quot;BEnGLEtt. In this case we Know that the PI is also sategorizable as an instance of UC-&quot;DOG&quot;, that. we may ex~ect that it will have a tail (although some dogs do mt), that it will bm, ana that it will chase cuts:</Paragraph> <Paragraph position="76"> It nay be that E: should be expressed as a probab~lity; s~at is, that there is a ao~tinuous range over which we nay expect ~omething to be entailed, with necessary entailnrent being one extreme.</Paragraph> <Paragraph position="77"> kt least for practicd purposes, however, it proves useful to make a three-way distinction ~etwoen necessary ent.iilments (unmarked;, default expectations (: and a third type which we call optional ent~ilments an8 mark with &quot;0:&quot; These last represent a lowcr degree of probability; they are entailments which are neither necessary por expected, but which nrs easj ly possible. 5or example, a bicycle need not have a bnsket and is not expected to have a basket, but it may very well have one: The distinct- on between necersary or expcxtea and opt onal entailments is of interest when it cones to the assignment of definiteness, as discussed in the following section.</Paragraph> <Paragraph position="78"> VT.TZ, vl$rz:ourse Inf omation and i3ead.iustmen-t~ A si~eaker needs access to three major c1;isses of informatim in order to verb.!lize su~cessfuhlg. First, of course, he :nust hare an Idea of what he w,mts to talk abmt: the content of the v~~lbaliza~ion. Second, hc must have access to general knowledge that i s relevant, the kind of knowledge thlt we are attennting to characterize in the lexicon. But there is a third kind also. The speaker must keep track of know1edr;e llavin(; to do with the very fa-t that he is verbalizing: knowledge about the soeech act ltself, and lts effect LJLL ~lt: wrson his ver;),rjllizat;~on is a:ldressed to. It is thls t lird kind of knowledge that we are calllng discourse, ini-ormation. 'rle are concerned in thls area with such factors as the identlty and social relationship of the speaker and the addressee, the time and olnce of the sp-ech act, and factors w'llch relate the content of the di:qcoursc to whrlt i3 nss:med to be p;qi.n(; on jn the rnind of the thel qct of v/>pb lizntinn at;! fin event in itself, since the vcrhaldiscourse. i)ir;cuurse inf'ormnti on in kept by VAT in tcrnnorarg $S;or:q,y e Unlike information in the lexicon, ~t 1s specific to even cbrl?<&quot;e Sle within a pqrticular dlsco~;lrse rather t'lm ~einp; ~otentially .a:.:,pllc-ble to an unlimited dumber of different dis-Our trestmont of dlscourse i.nformation is at present rudlmptntary : nd uneven. :jo f as sr)e:ik.tr ;~ddrensee are concerned, we siz lg enter into dlscour'se infomation stor7p.e st:rtement$ 11~e the fallowing: (The prefix bP st {nds for &quot;system predlc te&quot;; lt 1s used for a variety of nredicates assoelated wzth dlscourse inf ormatlon. ) The proTran rnakes use of thls information in various ways. For gxmqle, in iecidinr h~w ta lexicalize PI-1001 and 1'1-1002 'JAT makes use of infomat ion llke that in 62) In order to arrlve at f lrst and secrind :)#>?con nronougs; cf questic~ 1) and 43) In section V :3bove. Erohably ir~ tlost lancuap;es to some-degree, but especially in nang Asian lanqua~es, the social relationship between the sncaker and addressee pl:>ys a role of some klnd in veSbillzatlon.</Paragraph> <Paragraph position="79"> Ue have bees interested in lntroduclng such c~ns;~dar:itlons ~nto our verbalizatiqn procedure; and hV~ve so f?r concentrated on the questlon of how VAT $ho1Jd decide to categorize in Japanese a rI wali&h in &quot; ;I;KIE~;~~R~ WOU~~ he ~nte[;nrized QB m inntnnce oaf UCarb o~ver~1 cmto[:ori(:r: i tho Jn:)mo!:c Loxicon, of which conform to the definition of IJO-&quot;GIVL&quot; in 54) above, but which diff'er from excn other wit11 respect to the spcRkr?r-addrea~ee relationzhip. ow the choice can bc !lade is aont e,zsil;y illustrnted in, t1:e context of e translation psocedure, nnd we will return to thin exfirr~plo in the aection iX.</Paragraph> <Paragraph position="80"> VAT does little at present with cdn:;iderab.ionr; of 'the time md phce of the spo~ch act. katements like the followln - czzn be incluaed with discollrse inf qrmnt~on: (where L stands for articular location&quot; and : f0.r &quot;l~artic:llar time&quot;). dhether lJL-135? and M-1579 reinain thrgughout the discourse or are reyjlsced bv other r~lac'es ad times depends on th(3 r,:;ture of the discourse itself; sarne'ines there will br: signif ic mt chm:;es in thsse paraveters and sometimes not. lh m:~ cyse it is ossible for V,.;? 90 .answer cl:~estions :jbout tehge, for exanplo, Sy askinc whether the timmf .I JW -t;h:it is beinp; ver3al ized is before -or, after, or whether 1-6 incIudes, the tl,~e which h:ir; beer1 8:;ec~fic.d as !JO.i, sl-I bh a3 J T-1579 in h3 ).</Paragraph> <Paragraph position="82"> ?rocceds. The way in which 'JA~ nresently :Icco;n :lrishes such chcmps is through readcjyistment processes, -3pi)lied immediately aft :r each sentence has been con1,letely verS liaed. hene read.j:ast;zents L g:~eclfy the w-zrs in which st .re of dissourse inforn Ition has been azfected by the sentence. e of then, for exah::le, creates a 33 w .ich is the concept of tb: event of producj-nq .tkie nentence itself, which st~bsequently cm be trentud like any other ovsnt.</Paragraph> <Paragraph position="83"> Xverythin~ involved. in hhe verbnlization of that sentenco '~c?longs to the content of this CJ. If, for exwnt~le the spc,lkhr subsequently has reason to repeat what he sri~inally idi id, he may vmbalize in exactly the -sane wny (quote hlmself directly), or1 he may &quot;say the sane thing in different words&quot; by makiqp; different ohoices in (:ategoriiation and so on. The relevant information is available within the CC th-it re~resents the original verbalization.</Paragraph> <Paragraph position="84"> mother readjustment has to do w.ith the establishment of &quot;giyenness&quot; for items coma;micated in the sentc~nce. &quot;or' e~ch PI-A, for exmple, there will be, when the sentend has been com~letely verbalized, a readJustment process st atecibhe as: 64) SP-GIVEE (PI-A) If, for exam~le, the sentence in question was &quot;Mrq. 'drown gave Tonmy a cookie&quot; and Mrs. Brown, Tommy, and the cookie arc 1'1-1234, i)I-'1345, and ?I4456 respectively, then readjustments af t(?r the prod~zction of this sent;ence wlLl create the st.xtemnnts:</Paragraph> <Paragraph position="86"> If any or all of these PIS occur in the next sentence, they ;rill be pronom'inalized, arld it wlll not be necessary for \VA2 to ask the user a ~uestion like 40) above (IS PI-1234 .:1~3i*:?). Thus, the next sentence might be &quot;iIe - toox - them from her gratefully. I I lt is difficalt to decide when statements like those in 65) should be deleted fron the stdre of discfiurse information--when givenness evaporates. After a cc-rtaie- wried of time hss elapsed in which the :-I has not been talked about or otherwise kept in the addreqnee ' s conoci~uanosn, tho r w 1 pr~bnblv no 10n(:r?l: pronominnlize it. kt ~roonnt wn 4,et fit;t~$rlrnentn likq t!lor,e in 655 remain .only throup;h the To-llsnn:in~ flentence. Thur, i ? J3~-l2~'+,- for ~x~mnle, does .lot of:cur in the next RO~~Q~CO it will not ho tse~tetl as !:i#gn two sentences 'Lter, and will not br! prono~inalizcd. &st all discourse works in t'li~ w%y, bu't thS~ device provides a usnf 11 tempi )rary mprOximntion, A rather similar- kind of resri,ju~tm?nt has' to d~ w'ith the: establishment of 0 rclatjnn botwl?~tn 2 3,: ar~rl a 1'1 which WB ~911 ID!^ '!?he ~~resence of thi r, relation even t11nll;y lr?#tdi to the: lexicnlizat :on of the 1 with the definite n~tic10. . :J?~~:,OSC the snmkcr sqys &quot;1 bpuaht. d bic*;ch yc::tr?~dny. &quot; U,~rinq tile ~erb~alization of tlris sentancc VIA'? will hnve cre~tcd the ht itsment : 66) PI-1987 3> iTJ-&quot;3IC CLi:&quot; It ' That is, - FIi19Rr7 has aeen catc?p.ori zed ns m inrtmce gf 5.)- :3~l': 2'. This ~t-?tement than trir[:?rs a rea djust-.ont proc.Pas tkiiich 'crr?~t.es the disco'urse informtion: *- ;: flq lf'l LJ (: j,;- 14 ;3~ 7 - ,,T 11, .by iij*-1uadA2 .,A a J~JJ )I lJl -1'387) to know wh~t n~rticul~r inntwce it is (in this caie 1'1-l9h7).</Paragraph> <Paragraph position="87"> dhen, during a 1::ter sentence, VjiT coaos to the nuestion: 68) V: DOLS US-&quot;HJY 2,;&quot; P131:NTIE'Y kJ1-19873 as in 50) above, it is in a position to provide sts own answer without recourse to the user. Thus it will, on its initibtive, lexicalize PI-1987 with the definite article: 81q- 31 ; i~;~dtl H THE 11 , '~t is in waxs such as tiiis that we .CC~? attempting tb increase VAT s ability to answer its own questions.</Paragraph> <Paragraph position="88"> As in the the arises whgn a stritement concerning identifiability like 6) should be deleted from the stare of discourse informati~n. A11 that i.s clear now is that such statements generally last longer than SjA-GIVEN statements, and for the moment we do* not delete SP-IDETJTIFISS statements before the end of the discourse.</Paragraph> <Paragraph position="89"> It is undoubtedly the case, however, thzt some of them should be deleted sometimes, and it will be necessmy also to deal eventually with discourses in which there are multiple instances of the sme catep;ory: &quot;the first bicycle, the second bicycle, etc. 11 The presence of Lexical information of the type that was described at the end of section VI has an interesting and desirable effect on ~eadjustments, snaclficdlly with respect to statements like 67). As aq example ,. we might have a lexical entry for UC't31CY&quot;JLE&quot; which includes: That is, something cate~orized as an instrance of UC-'tBICYZLLu has as a necessary p.mt something categoriz-able as i*n instan~e of U3'4FHAPE1', . & also has as an opt-ional part something cat egorizable as an inatance of UJ-&quot;'dd569.P&quot;.</Paragraph> <Paragraph position="90"> Now, it may be noted that the second line under , which deals with the categorizatinn of PI-B, is a statement like that .in 66) above.</Paragraph> <Paragraph position="91"> After a sentence like !'I bought a bicycle yesterday'' has bezn produced, this line will therefore trigger a readjustment proces:: which creates the statement: 70) 8P-IDENTIFI'4:E (UC-&quot;PHM!IE&quot;,. lal'-1'~68) (with whatever number it is appropr~nte to assign to thia PI 1. As a cchxsequuence, if PI-$468 occurs in a subsequbnt sentence it will be lexicalieed with the definite article, as in &quot;The frbe is extra large. &quot; Thus, as 1s. aes~raolo. definite nnss is c'reatea not' only fort instances of the category ftrst rnedtioned, but also through entailments of that cntegory. It -should also .be noted that in this context it-is a little odd to say &quot;The basket: is extra large&quot;, talking aboot. PI-C.. One would be ,nore likely to say &quot;It has a basket which is extca large&quot;, or in some other way to introduce the basket explicitly. In other words the process just described works. better for necessarv arts than &for optional parts of the firstmentioned obja~t <PI-A). We therefore exclude from this readjustdent process PISO that hrve .been introduced through opt lonal entailment s.</Paragraph> <Paragraph position="92"> The general nature of the $ranslatinn procedure was oatlined in section I, and dia~ramed in Figure 1. 1'0 summarize aeain, '/AT will start with a text in the eource language, will re con st;^-uct the verbalization processes which produced that text, and will then itself produce a. paralleLl. v'erbalization in the target language During this last procedure it will agply sptactic processes anapropriate to the target lmgua~e whenever it can, but at each of Chose. many points where it must make a choice of some kind it will look across to the. source Language verbalization to see what choicewas made there. If poslible it will. B-quate that choice (lirectly with a correspondinp; choice in the target language.</Paragraph> <Paragraph position="93"> If no direct correspondence is available, it will compare the lexicons gf the two languages to determinebwhat correspondences are possible, and will then sewch the conf ext . to decide which of them shozxld. bc: chosen. b:e will be particularly concerned in his section with illustrat-ing a case in which such a qon~lex choice must be made-in which the zigzag arrows in F'i~ure 1 haveconsiderable content. First, however; it may be useful to provide a fPmework by i~lustratdng a relatively simple case where the corresodndencos ore more direct. We will use as our first example the Yollowinq brief text f rnm Japanese': 71) Reizooko o utta. Okqe ga hituyoo datta kara.</Paragraph> <Paragraph position="94"> ref rigelator sold money needed was because \ie wlL1 want to consider sone of the arocedures VAL? will follow ~ri translatins this sentence into hglish: 72) I sold the rezriger &or. I needed the money.</Paragraph> <Paragraph position="95"> Actually our attention in this example will focus on the first sentence, since we will later want to consider the comvlic~ti~ns that are added by changing the verb in the first sen'tence from utta 'sold' to kasita 'rented' or 'leht Let us first revi3w the manner in whic'h 'J1i'I' will reconstruc-t the original verbalization of the Japanese text. Since our eventual pwsing' corn!~onent will follow a kind of- ''analys~s by synthesis&quot; procedure; we will also be suggesting-here the steps of the parsing program. The only difference, md of course it is a big one, is that for the moment VAT- will ask that: decisions. be---made hy the usinstead of itself deriving them from the t6xt together with its own knowlodge of the world. The conversation wikh the uoer will. procaod as follows: 1. V: WiIAT VAT TAU DO YUU WANT P.FCI-(E'OlIMEDZ 2, U: VERBALIZE CC-%UUL 5. V: CAN CC-2001 BE: 38TZGORIZED1 As explained for example 9) in section 11, and with the proper insertion of periods, VAT's reprerlentstion now is: Vat's representation, as explained for example 32) in section IV, now includes: Vzt finds UC-''UH-'' in the Japanese 1-exicon. The first three lines of this entry are: CC-A C> UC-&quot;UH-&quot; am* As in example 34) in section IV, VAT creates- the representation: Since the beneficiary and measure PIS are optional, VAT rlext asks: 17. V: IS THE WPSU~E EXPLICIT? The next two questions are: 19. V: WIIAT IS THZ AG~T? 21. V: WIIAT 'IS T33 Z'ATIENla VAT now has the. following represe~tation (cf. 36) above: Beginpling with PI-2~01, it might ask fir~t: 25. V: IS PI-2001 GIVEN? In fact, however, we assume that the speaker (and addressee) are latnmtically given, so that VAT contains a general entailment to the effect that: Since by convention PI-2001qis the s~eaker, the follorirink is already stored as discourse informa~lon: Thus VAT w;:s nble to ~ive an affirmative answer to guestton 25 above without asking the user. Pronominalization in Ja~mese is a complex matter, deaendin~ in !)art on social lela'tionships, and we have not: 7s yet c,List&quot;ruoted a procedure to introduce the correct pronoun for a PI that is given. We have,' however, taken advantage of the simple fact that given PIS are very often deleted, with no surf ace representation at all. In theh present example, anti in many others, the slm~le deletion of such a PI produces the correct result, SQ that an affiqative answer t,o question 25 leads to the recre- null The first three lines of the above are xtually as far as we go at the present time in the surfrice representation of a sentence. 'rle try to include in such a representation everything that is needed to a:-rive at a correct linear sequence of words. In this case the combination VB-&quot;UH-&quot; / &quot;EASTft will yield the surf ace word -3 utta which will be placed in sentence-final positlon (followed by the peribd). That leaves .reizookd o as the first words in the sentenhe. VAT, would. next ask about CC-2002, but we will not c&ry the verb lization process furt-her here. he are interested in how just this much of -the text will-be translated into English.</Paragraph> <Paragraph position="96"> By and 1-mge VAT will ask- the same .questions it asked in the course of the Japanese verbalization. It will look for the answers in the answers thpt were 3ive.n there, and when possible will ~pply corvasponding answers in .English. Along the way, whenever aopropPiate, it will appzy syntactic processes that ,are called for by the strupture of English. The translation, then, begins with the same question that beg:in the verbalizatton' in Japanese: V: WHAT VAT Tr.rSK DO YOU WATIJT L''t3tZFORMLUil The answer given in line 2 above wtas V!IRBALIBE GC-2001. The English translation must use its own four digit numbers; in what follows we will simply substitute the English digit &quot;1&quot; for 'the Japanese digit Of course here as elsewhere this question is not actua1,ly asked of the user, but is answered internally by VAT. The next questionsexactly parallel lines 3-8 above: V: ITAAT IS THE GzNRE'2 V: CAN CC-1001 BE CI~T3~GOKIZkZD? We assume that English would not in this case use the word because, but simply juxtapose the two sentences, as in example 8) in section 11. Thus *the represenr;aT;lon now is: Lines 9-13 of the Japanese verbalization have a direct correspondence: At this point the Japanese was UH-.</Paragraph> <Paragraph position="97"> That is, the categorization was in terms of the Japanese category UG-&quot;UR-&quot;.</Paragraph> <Paragraph position="98"> It is necessary to ffnd an English catep;ory that corresponds. The procEdure at this point is to look flrst in a stored list of bilingual category bqu&?alences wl-rich we call interlingua. The entries in interlingua are of the following s~rt: UR- SELL That is, the list contains pairs of categories, where the members of each pair are assumed to categorize what is, for all practical purposes, identical content. The assumption is that if a CC can be categorized as an instance of UC- I~UR-U in Japanese it can also be categorized as an instance of UC-&quot;SELLtt in English, arid vice versa. Similarly, Japanese UC-&quot;HUN-If and Znglish UC- &quot;BOOK&quot; are equivalent - categaries. As a general strategy we expect that pairs wlL1 gradually be.removed from interlingua as differences between the paired categories are discovered. Lingustic research has not yet progresse,d to the point thr:t we can siy with complete certainty that aAy tpo categories from two different 1anp;uaf:es embrace exactly the same content. At the outset, however, it is iiseful at Least to pretend tha,t .UC-lJUH-lt and UC-&quot;SELL&quot; are equivalent, and probably thprp we at least some pairs in interlingua that will remain viable for some time.</Paragraph> <Paragraph position="99"> The present eTample w8s chosen because the answer to the last question above - Gan be found in interlingua. Later we will consider a case where it cannot. At this polnt VKY answers its own question with: tha looks at the 1exic:il entry ror UG-~~SELL'~ (which we assume does not differ -'ran that for UCr&quot;Ulf-&quot;j, and creates the representation:' The questions and answers which parallel llnes 15-22 of the Japa- null nese verbal-izatim are atrnightforward; V: S TI- 1 LC EXPLICIT t V: kJHkT IS THE AGENT? V: GjTIkT IS TIE i':LTIdn7T? The rer~resantation now is: The next exchm~e is: which creates the represent at ion: With the lexicalizat~on of PI-1001 the procedure is different in English, since tbis item cannot siqnly be deleted as in the Japanese. We follow the questions illustrated in examples 40) through 43) in section V: V: Is 1-1001 GIVEN? Thus the repr-~entati~on now is: Now corne.8 the lexicalization of the direct object, EL'-1003. The initi~~l question's parallel lines 27-31 of. the Japanesd verbalization: V: 12 PI-1003 GIVSN? V: DO26 PI-1003 HAVE A NAMX? The Javanese answer was HEIZOOKO; VAT' will-now look in interlin~ua to see whether that item is there, and we asoumc that it will be f omd paired with English HEFHIG~HATOH. Although *Japanese was able to terminate the verbalization of PI~2003 at this point, English must ask the qIlestion introduced in exmple 50) pf section V: The answer deoends on the context, but let us asdurne thqk it is yes. The representation now is: bt 'de now have the kind of re~rcsantation of the first sentence that is our current goal.. ~~ormal Enel-ish word order will put the subject first, the verb second, ic~d the direct object last to yield the final representation '&quot;1 sola the refrigerator&quot; of 72). The above example was chosen to il1ustr:tte a mmlmally simple case of translation: one in which, in particuln~, the answers to all questions about cross-lanquage ~~ategorization could be founcl in interlingua. Phe intereeting cases, ~QW~V~T, arc those in which interlingua does not proylde dl the answers. It is in these cases that the zigzag arrows of Figure 1 muat be further elaborated. The general method of elaboration is suggested in Figure 4. Assume that we are producing .a verbalization in the t<=get language and, coming down from the upper righthand corner, we arrive at a point where a CC or PI needs to be categorized. follow in^: arrow 1, we look across to the source language verbalization to find that the correspondln'g ZC or PI was categorized in a csrtain wqy, let us say as an instance of category A. We look next at interlingua (arrow 2) If A were there, we would take the .target language category paired with it (such as SELL and Ri.:F!iIGE:IATQR in the exam~~le above), intFoduce it the target language verbdization, and proceed. Now, however, xe are considering those cases in which A is not found in ivte-lingua. The next step, following arrow 3, is to look at the entailments of A in the source languaye lexicon. de next follow arrow 4 to search the target lanp;uap;e lexicon for entries khose entailbents are coa~atible with tho-e of A. (This search procedure is likely to present chall-enging prob1t:ms when the source language lexicon reach- any interesting size. It io, howwer, facilitated by the Tlresence of abstract features like TRANSFER hd T'IUSACTION which can be used zo limit the domain of search.) Suppose that we find tw~ gntries in the target language lexicon, 3 mi3 C, both bf whose ent~ilments are compatible with the entailments of A.</Paragraph> <Paragraph position="100"> Ue then look to see how the entailment-s of 3 a~d C differ and find, -let US say, that 3 contains entailmb:nt(s) X while r: contains entailment(s) Y.</Paragraph> <Paragraph position="101"> de then follow arrow 5 back to tne aource 80urce inter- target target language lapguage 1 ingua lan u~ge source lan~ua~e vorbalizntion, hoping to find sometfling in it that will allow us to choose between X and Y. (~e;ain there are chal.lenging problems in searching the aaurce language text for the answer.) Let us now assune that we find something in the source language text that is corn~~atible with X but not with Y.</Paragraph> <Paragraph position="102"> We are then able to choose B as the correct target langusge catetl;ory. We int-rocfuce that category into the target 1angua~;e verbalization via arrow 6 and proceed.</Paragraph> <Paragraph position="103"> In those cases where the choice between X and Y (and hence between B and C) cannot be made--where the source language texb does nat proviae the answer-LVAT must resort to sskine; the user for the correcs caGegorlzaslon.</Paragraph> <Paragraph position="104"> We will illustrate this procedure with the brieuf Japanese text: 73) Rnizooko o kasita. Jkane ga hikuyoo datta .kua.</Paragraph> <Paragraph position="105"> refrigerator r6nted money needed was because We will want VA:? to translate these two sentences into English: 74) I rented the rented the refrigerator. I needed the money-We are no$ concerned in thls example with the fact that the first English sentence is amil-ji~llmls between rented (to someone) and rented .(from someone). but with the fact that the first Jarmnese sentence is amb@ous between rented and lent. In both ceises, it seems, the second sentence serves to dislmibif;uate. dhat we are interested in now is the fact thilt V~Tkl.nunt ~omehow-choose between ~~ and LEND as the pr0pe.r coon'esponaent for Jananese Kiib~. 'de can ssume that nost of the verbalization in both lvlguagas proceeds 'along the lines already exemplified, since 71) and 73) sre minimally different. Imaginq, then, thqt we have carrived at the point in the English verbalization where the au.estion is: Vt HOW 1;; CG-1003 CATECrOLlTZEU? We are now in tho upper right of Pi~;ure/rC, and we follow arrow 1 to find that the corresponding Gc; in the ~'apanese verbalization wag categorized in terms of UC-&quot;ICAU-'I. We then follow arrow 7, and find that KAS- is not in interlingua. 'vle look next via arrow 3 at the 3ntailment.s of UC-&quot;KM-&quot; and find thnt they are 11s specified in example 55), section v'I above., but without the last line of thnt</Paragraph> <Paragraph position="107"> Substituting four digit numbers for the %varinbles, we8 obtain:</Paragraph> <Paragraph position="109"> (PL29O2, CC-2905, dnd CS-2906 Lave been inserted here as arbitrary numbers. It i.s ouite possible, however, that these are items whicl show up ex$licir;ly elsewhere in the Jananese verbalizati >n. For example, 21-2902, the one who receives the refrii?:rator, might well bs mentioned elsevhere in the text.) Since. CC-2003 inv~lves a tr~sf er, Vlrl must also ~s~ign numbers within the definition of UC-TH~~~FEII, given in sectiw VI akove ae example 52) : Thus there. is a change from the renter or lender (PI-2001) Having the ob~ect (~~1-2003) to the rentee or borrower (Pr-2902) hai7ing it. The last three lines of 76) made it clear that this was not a chwc in owdership but only a change in use, and that PI-2001 retains omm-sh5p throughbut.</Paragraph> <Paragraph position="110"> Following arrow 4, we carr-y these entailments across to the English lexicon and search for entries whose entailments are compatible with 76). Compatibility means that these entries will contain what is in what is 76), but may also contain more. Let us say that we find two such entries, one for the category UC-&quot;LENDt1, which was given. in 55) above, and one for UC-&quot;HENT-2&quot;, which was given in 57).</Paragraph> <Paragraph position="111"> The next step is to isol-7te thf: differences between UC-&quot;LEND&quot; and UC-l'RjQFJ~-2&quot; ' Uc-&quot;~i$f~ff , as mentioned, differs from 75) in containing an additional final line: 78) CC-A -C> UC-TRIiNSACTIOIl That is, CC-A cannot be categorized as a -tran&action. UC-&quot;i%NT-2&quot;, on the other hrmd, contains the statement: 79) GC-.A C> UC-TRA~TSP.CTION At one level of babstrdction the question whish must be answered, :herefore, ie whether CC-1w is or is-hot :I transaction. InformalL;L, this is a metf;f?r of whether PI-2001, the renter or lender, did or lid .not receive money in exchanga~for the Crmsfer of :Ine of the ~bject.</Paragraph> <Paragraph position="112"> The Pollowinp;' digits can be inserted for the variables in the lexical entry for UC-'&quot;KENT-2&quot; :.</Paragraph> <Paragraph position="113"> instance 01 UC-&quot;KENT-2&quot; involves a number of thin~s. First, there must be a person who does he rentin@ out (0 1, a nerson who receives the rented object (PI-l9Ol), the money that is paid in rent (PI-1902), and the rented object itself (PI-1003). Furtherinore, CG-1003 &quot;is said to be a transaction, and certain equivalences are stated between the RENT-2 definition and the 'THMLACTI h def ini Lion. VAT must ther*efore assign these particul-w PI and dC numbers withln the definition of u~-TRA.NSAGTION' which was givenx as example 56) in This says that 30-1003 can be nf7raphrased as two transfers, CC-1.901 and CC-1902, the first of which was for the mixrpose of the second. (CC-1901 is the transfer of money, and CC-lq02 the transfer of the rented object.) VAT must, therefore, look also at the definition of UC-TRAIiSm~, ~iven in section VI above as example 52), and i~ltrgduce-eghin the proper and CG n'mbers for each of these particular trasfers. The first of them-wlli be represented as: That is, the first 'transfer involves a. chanae from ZS-1903 to 3C-1904. In CC-1903 the rentee (1'1-1301) has the money (21-1902), and in CC-194 the renter (PI-10~1) has it. The second trxnsfer is represented as:.</Paragraph> <Paragraph position="114"> Here there is a change from CG-1905 to CC21906. In CG-1905 the renter (PI-1901) has the obje-ct to be rentec! (PI-100j), and in ~~-19dcj --the rentee (PI-1901) has it.</Paragraph> <Paragraph position="115"> In 80) it is also stated thnt 1'2-lr302 can be catep;orized as an instance of MEDIUM-OE;l5XSIlUG,E, in all probability therefore an instance of UC-&quot;M\JNEY'' (see exLmplo' 59) in section VI above). 'FurtAeknore it is stated that tpe change in the having of the.money (from :C-1903 to OCn1904) inyolves a change m ownership, whereas the :hange 1~1 the havina; of the rented object (from C;J-1905 to CC-1906) involves a change in the use. Finally, it; ir; stated that the renter (PI-1001) retain3 ownership of the rented object throughout.</Paragraph> <Paragraph position="116"> What VAT wants to find out, then, is whetlzer these thinas that. must be true if CC-1003 is to be an $nnstdjnce of U'~~&quot;ltLi~-2'~ are indeed true, or whether the bottom.lin6 in the entailments of 'JC&quot;LElUD&quot;, exmpJe ) is fulfilled instead. 'VAT tries to decide-this bv followiny, arrow 5 to the v~rb?lizqtion of the Ja~anese text. Of course -there are- snany ways in whhh the answer might appear in that verbaJ,ization, - if it appears at all. If VAT is unswccessfui .in its search it will have to ask the user directly? 84) V: IS CC-1003 CK'PEGOIIIZED 1;s LAND dd f1EI?IT? In 73), however; we have made things easy .by supnlying a context which ought to decide the question.. It wil t)e remembered :hat the second sentence in 733 expresses C3-2002, whicn is the R9nS0n for CC-2005. or what is expressed in tho first sen.teiice. PTow, C;i-,2002 is categorized in the Japanese as an i-nctance of UJ-&quot;AITriYOC, DA&quot; which means something like &quot;be nee'dedl'. Let us assume that the Jap-anese lexicon c~ntains an entry for this categorytwhich incrludes the following:</Paragraph> <Paragraph position="118"> Ihe case frameb immediate3y under the E> identifies 1'1-B as the beneficiary, the pe.rson. who needs something, while the thing needed .is labeledT PIX.</Paragraph> <Paragraph position="119"> The second link under the E> says that an alternative framing- is possible in terms of an abstract verb WNYT, wherein PI-B wants GC-D, and CC;D is then characterized in terms of PI-B hming PI-C. In other words, when one nt?edsdsoaething , one warns to have it. (If this is not alwqys true., at least it is the expected entailment. ) If 853 is -going to Drovlcie an answer to 84), there must also be a general prin'ciple of some kind which relates what is entailed, by CC-2002 to whak is entaiied by-CC-2003. This general principle cw be stated as fbllows: The first line says 'that PI-B wants CC-C. The second line says that PI-B does something. The third line says Ahat his wanting CC-C is the reason he does somet~ling. All af this together is then said, $0 entail that his doing &quot;something entails what he wants, or CC-C. In other words,. if one wants something and does somhethlng because of Mat. then what one does must entail whqt one wants. During the verbalization of X-2002 as part of the verballzation of the Japanese text, VLd will nzve recorded the lact that C2-2002 was categorized as an instance of UG-&quot;HIT;JYOO A, and will hrtve entered the following statements in accordance with 85): The first line of 88) was obtaified from 87). The second line was obtained from 76). The third line comes from line B of, the Japanese verbalization set forth ab the beginning o.1 this section. idhat we are interested in now is the last line of 881, which says in effect thab CC-2003 is categorized in such a way that CC-2904 is trbe, an? loo~ling back to 87) we see that (2':-2904 lnvolves 21-2001 having PI2902, or the agent Of kasu having okane 'money' . making the necessary correspondences in English, this means that CC-1005 must be cntegorized in such a way that CC-1904 is true., where: 'This is exactly what VAT finds as the last line of 82). Since 82) is entailed by UC-&quot;H:CNT-2&quot; but r10t.b-y UC-&quot;LEND&quot;, th'e question in 841 has been answered, and the avow labeled 6 in Figure 4 carries back the choice of UG-&quot;HMT-~&quot; into the English verbalization, which then proceeds as it did in the translation illustrated earlier. By this .complex proce-ss involving comparisons of entailments within and across languages, as well as the general principle stated in 86), VAT has been able to make the correct*choice. So long as the answer to 84) was derivable from sometr~ing discoverable within the Jepaflese verbalization, VAT could in pri-nciple succeed. It is clear, howevet that .the route to the answer cbould be extremely complex, involvin~ chains of entajlments of unforeseeribW1en~;th.</Paragraph> <Paragraph position="120"> There is no doubt that such procedures are necessary to mder such questions, and that they present an e~traordi~ary challel~ge to our techniques for information storage -and<search.</Paragraph> <Paragraph position="121"> IX. Miscellaneous Problems in Translation Since we have spent considera1)le time looking into various specific translation problems beyond those illustrated above, we pres,ent here a few additional examples- ox me sorts of things that will have to be takeb into account during the implementation of machine translation along the lines suggested above. Two of these examples will, like those in the last sect?or,, involve the choice of a category in the target \language rhen that, chbice is not directly provided by interlingua. One has to do with the translation of Japanese osieru. into English; the othGr, the translation of English @ve into Japanese. A third example will illustrate the 15ind of probkem that arises at the st age of subconceptualizat ion qnd sentence formation.</Paragraph> <Paragraph position="122"> The following three sentences illustrate three possible English translations of the Japanese verb osierx 90) Gaido Va Kookyo ,ga doko nl. aru ka osiete kuremwlta.</Paragraph> <Paragraph position="123"> guide imperial Palace where is showed Soko kara tookyoo tawaa e ikimasita.</Paragraph> <Paragraph position="124"> there from Tokyo tower to went The guide chowed us where the bperial Palace was.</Paragraph> <Paragraph position="125"> From there we^ went to the Tokyo Tower.</Paragraph> <Paragraph position="126"> 91) Gaido wa Kooky-o ga doko, no aru ka osiete kursmasita gui de Imperial Palace where is told ga watasitati ga soko e itta toki ni moo simatte but we there to 'went when already closed imasit a.</Paragraph> <Paragraph position="127"> was The guide told us whem the Imperial Palace was, but when we got there it was already closed.</Paragraph> <Paragraph position="128"> 92) Kimatu siken no tame ni sensei wa senester-final exam of for the purpose teacher Kookyo ga doko ni aru ka osiete kud~saimnsita.</Paragraph> <Paragraph position="129"> Impgrial Palace where is taught For the final exam the teacher taught us where the Imperial Palace was.</Paragraph> <Paragraph position="130"> xach of these ,extunples contains the phrase: 93) Kookyo-ga doko ni aru ka osiete which is translated in three diffe~ent ways, determined by the context in 90): show where the Imperial Palace is in 91): tell where the lImpe~7ial Palace is in 92): teach where the 1m~eria.l Pal .m is The difference is localized in the translation of osiete, a participial !form of the verb osieru. This verb may &quot;be transla-ked into Englirqh as -9 show - tell, or _- teach according to the coqntext, and the problem is to Identify what the determnlng rzctors atre.</Paragraph> <Paragraph position="131"> The Japanese category UC-&quot;OdIE-If -1s well as the English categories UC-&quot;SiIOW&quot;, UC-&quot;TELL&quot; 9 and UC- ','TZACH&quot; are all included within the more abstract cl:tegory UJ-CONWNICBTION, which can be defined as follows: Sube te;-ones -5f G ;- ,I.~JIO iI:ATILol; nay differ as to the n?ture of the act I p~rfomed by the LUi.llilUiLLdLUL) rza. LV ~LLU r.llld of knowini: th t result's (e.. whothm lt is retamed in surfwe 0-r 2eeo lenor:r), ml? in other wejs such as the al~thorit-;?-t-tlvenesr; of the cozr,u.p~c~itor .?~iti; repect to what 1s connunic~ited (2:-11. 2he the sct nf,jrZor b;~ the' co~mmicator; apparently he cnn do blmost 11 1 IIrp);LLtt an::.thinq $hat will hc:ve n cornrnunic3t~vc I'unctj.on. u J- on th.: other hrmrl!, :ntnlls a verbal act, I; >-&quot;A ibh&quot; ail act which dlrcetbp 11r-q I I the other nccsml s visual att-ent~on to 22-1, U ;h L~~~~JI -I:I act w%ich is didxtic In. nature. It lc- dlfIicult.to delimlt the qcts wnich ~:.inllf:~ ss te~c'lin~, but evid2ntl:q they nust have an in-truc'- Y I1 &quot;1 -11 ti~nsl r:mlitg wh1c.h 1s .lot, necessary f~p LJ- OLJI=-;&quot;. 'J2-11T.~kv:~ 337 r.1~0 be -mlque reqalirins: that the '{nowjnq - be deep 3~ lone-tern ~now~p~, st least in th.? intention of L-I-:3. J;: :t~c:se</Paragraph> <Paragraph position="133"> - 0 TI, fop L~C nart, require that, PI-3 be 3~~thoritatlve with respect to the contenf- of what is being comrnunicnted (CC-I).</Paragraph> <Paragraph position="134"> But how is it, foi- exomplc, that-the context in 90) restricts the transltition of &quot;OJIiS-&quot; to &quot;:JI1OW&quot;'I The second sentence in 90) says thnt we went from there (ooko), whose referent is thr: location of the Imperial I'alace. Thus, nt the time of the comunic:~tive event, we must have been at the Imporial Palace. Prow, thero is evidently a generax principle, like 86) in the last sectfgn, which says that a vnrbtil act is not ujed to ccxnrnunic:ite where riomethin~ is when the beneficiary of the act is already at that place. i'here is evidently no such restriction on direct in^^; visual attentson tc whc?re it is, hence UC-&quot;SH\;W1' is preferred to U~:-&quot;T;IJL&quot;. hnce there is no thin^ in the context of 90) to su1:~ent thnt teaching methods were involved, UJ-&quot;Y;IDW1' is left as the only crm?ldate.</Paragraph> <Paragraph position="135"> In 91) the situation is otherwise. The second clause mikes it clear through the phrase translated &quot;when we r~;ot the~e&quot; th t we were - nl)t at the lrnp 1~iaL Palace at the time of the c~~~rnzm~c:~tlve act. Another ~enernl principle says th~t vld~~aZ attention cm be I i directed only at thn~s within visu~1 rY:lnpe. Thus lJ$-&quot;oiiOf~ is In T 1 t is case r:~lad out, ns i i opnin Secai~nc of the nbsence of didactiic context. iJ;-&quot;T ,L!,&quot; 1:- thus thc cllo~ce here. In ) the ditlnctic c7r:text is av111ent. Phe h+mnena word:: kimntu, s~ken, mcl nen$el all belon{r within the sernn::'tic field of -.teochlng, a fsct to be nated In the lexical entry for eFeh of them. dence the Xnqlish zateyory K11 I ' 1 TI 1 , O~V~OUS~~ a xenber of the sane semantic field, wlll be the cho~ce hnre. :robn31:~ we sho~ld also tI*e account of the fact th.4 the idionqtlc vr:.S at the end I of thls sentence, 1 Rave ' ,/ reinforci,~ the superior relati~nship of the communicntor: in this cnsc, the fact that he in $uthorita-tive with respect to what is Boinp comibunicatcd.</Paragraph> <Paragraph position="136"> The boint of this exnm:de of the 't-rnnsl~ti ~n of osiorll ir: to emphasize th,~ conplexity of the criteria which lrmy have to bo invoked to decide Decwoen no~sible transbations. iicre we have seen a link betwym different kinds of cornfiuriic:ntive acts cahd the location qf the recipient of the conmuni :ation, inforrnntlon on the latter being derivible from information a3o1lt the aove:nent of the rc?ci,:?ient to or from the plac~ of c~mm~~nicntion, to!:e.thor with ternnoral information., It is also of interest thnt this cxmnl-e, like the st.cond exam~le in section VPII, Icd US to recognize certzin qenvral principles: that ohe does not communicate verbally nbo:xt whkre Somethin? is when the addressee is already there, For examsls, wd the ooWlous prmclple that one does n3t call visur41 ntt-entlon to something tSat is not visible. Detalled i~nlefient ation of this kind of translation rese~rch wlll undoubtedly lead to the recopnition of a nuber of such principles.</Paragraph> <Paragraph position="137"> The word kudashimasita In 32) lys s us to a jiffer~nt kind of connlicntlon : ?,*!lat i~lvolvec! i?~ the need to ecisl attention in Japanese verb:llizaf, ion t:) thc socinl relnt I onskip (?xir;tinr< between the s?, j:rlCl>~ a~d ~rjrl oils 0th~~ nbJvrons. Alt '+=ollh-h- \vc3 arc; chan~inp the dlrectlon of trnnsla~ion here, it is of so:m 5nt~~e~t to consider r;l:c?stims thr:t %~rlse in trahr;7atinm ths 3ni7!1sY1 ~nhqory U d-&quot;GI%Z'' into Ga~mese.</Paragraph> <Paragraph position="138"> 7.dr, A~r-~ :2ssuze th9t J - has the sntailv?ents flsted in exa~nle t, sectlo11 71 above, -?+~,7d t1p.t filrthern0x-e the cqtcrories u,rlrlcrl-~lr;~ 1 thc J.~?,TLc se vk*=i..; to be nentlonerl shsre t>er,s rrwto mtailne?ts. d;icn J2qn3e~c c lct:~ -oyv, howevnr, ~:IR add:i4zlonnl, ent~ilrn~ntg. of its. own, 1 it i~ the nature, of these, nrznltionnl entnilmentn that we nre i.nt,ox:.ogtcd in. What follows is ijnnod on tho anniysis in Kuno 1973:127-135;.</Paragraph> <Paragraph position="139"> &quot; The verb - kureru is unad to ex in~tnnccc 3f a cntcryry Whoqe entailments ~nclude those OX. UC-rtlGIVB&quot; ' plug the followinf; (where YI-B is the (went nnd :'I-,': the b:e,no.ficiarg of the ~ivin~) a*.</Paragraph> <Paragraph position="141"> That is, Ud-','dKIIHS-&quot; is, th'e c<)tecory chor,Bn if the bcneficiqry of the giving is socivilly close to- the speaker, closer to the sneak~r than athe aqent ofl the giving, and the agent in not socially hiqher than the henef'ic-Ta~y. In translatint< texts where such infornXtion is relevant, 71iT will bitller h3ve to store a n:-~twork o: social relation5 linkin(; all the relevant I ndlvld~~ls, a network which nav -;n c?rt be ,I'eYlvnble frorn th~! tvt, pr ~t will ~'Iv(? to ;isk the user c?l~'~r,t ions llke: 'is used to 'oxpress inst :inces ofFiocelt':f-or7 j d whose e;-,tr4ll :IOT t c me. RS follows: In other woqds, btho -entnilqent~ of UO-&quot;KUUUI-I2-&quot; Rre the name ao those of U3-&quot;KUIId-&quot; except that the agent of the ~ivinq & socially 'higher- thnn the beneficiiwy.</Paragraph> <Paragraph position="142"> (It was the exalted poniOim of ~ensei,, the-te'aohor, in 92). Chat led-to $he use of kudaoaim:~,sita 'in that mother possibiJi'ty is the verb yaru:</Paragraph> </Section> class="xml-element"></Paper>