XML Viewer - j79-1035

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/j79-1035_metho.xml
Size: 117,748 bytes
Last Modified: 2025-10-06 14:11:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="J79-1035">
  <Title>CAUSE CAUSE</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
TABLE OF CONTENTS
SESSION 4k MODELING DISCOURSE AM0 WORLD KNOWLEDGE 1
</SectionTitle>
    <Paragraph position="0"> Establishing Context in Task-oriented Dialogs Barbara G, Deutsch ..........-............,. 4 Discourse Models and Language ComprehensFon Bertram C. Bruce 19 Judging the Coherency of Discourse Brian Phillips . . . . . 36 An Approach to the Organization of Mundane World Knowledge: The Generation and Management of Scripts R. E.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> Thl8 paper dtrcrlbrs part bt the dl8e~ut~e easpanant at a cp~ech underrfrnding SYStla far tark=0ri@ntad dialogrl ~p~citl~~lly~ r esehinlrm far rrtrbllrhing a tacur at attention to rid In ldsntitytng the referent8 Of derinlta noun Ohrrrer, In building a teprrrentatlon of the dlrlog context, the dlscourre ptocssro? trkrs advantage of the fact that trrkmorlantcd dialogs have a structure thrt clareLY parallel8 tha rtructure of thr trsk, The irm@ntlc nctrork of thc system 18 partltlonrd into toeur rprcrr with rrch foeur apace C~ntalnlng only thorr eonceptr pettin~nt to tha airlog relating to r rubtrrk. The facur aprcer rrc link@b ta their QarraSpan4ing rubtrrkr and ard@t*d In r hterarchy dotrrnlnad by th~ relatianr &amp;Rang rubtrtko.</Paragraph>
    <Paragraph position="1"> Tbi# rrrarrch war @upportad by the Deferire Advanced Rerearch Progtct8 Aprney of the Departrant of D~fenrr and raonltbrud by the U.8. &amp;t1y Re8ewch OF tic@ under Contrret Wa, OAHCO~-~S-C~OOO~.</Paragraph>
    <Paragraph position="2"> &amp;anouror cormunlcrtgon rntai11 thm trrnrairslon ot contcptl from the #p~.ker@@ aodal of the world to thr 1i8taner~s. If is erueirl that the rperkr? be eb1e to comaunicrte dereriPtlonB at concepts in hlt model in r wry thit allow8 the llrtrner to pick  cooplefaty unrwbt$uour way, ~ante%tur~ clues froa both the #Lturtf&amp;a and t rurrsundinu dialog are eountrd an to help dfrrmbiqurte, Ths listenet@s prebiern Is to ura that context to h*%P gn hfr gbenttftcrtgm of the cancrpt being comun~cltcd, As r 8,rnpLe *xanplet con@lder the utfcrrnetr *Hand n$ the box-end U'r)nt?hr&amp;quot;8 Lt ~tght aecut in a canverratlon between two PIOP~~ warking on r rnrfhtrn4nce trrk, Although many baxmend wrenchar ~y be knern to both the sp*&amp;krr and the listrncrt the tact tkrt the Iistener hi8 r ~erttc~lar bOX*end wrench tn his hand sakar the noun Phtl8r ~n&amp;lbLgu0~8. (re1 other CXIA~~~I, sea Rotnrn, Ranrlhartt et at,, lWSt, Ln the mort extreat* crsa, tha ure of pronouns depend8 rntirrly an the dl.10~ context to dcfarmtnr the intad8d getetantt @itR can retet ta any ringle %nanlmate objact Or *vente k W@bjN@ arises wt.th @111$tlcrl @uprlrs$anr, Oftan the rurroondino dlrlog tupp1i.r rnouoh information 80 that only r word er two ruttierr to camnunlert@ an antirr (eorglew) id@&amp;.</Paragraph>
    <Paragraph position="3"> Far Wfam~lrr e@n#id@r the tollortn~ exchrngrr ~a Bolt the pump to the ptrttorm, Aa 0.K.</Paragraph>
    <Paragraph position="4"> E1 wnrt taalr are you using [to bolt the pump to the ~Latforml.</Paragraph>
    <Paragraph position="5"> A# My fingers fate the tools I rm urlng ,,,I The *xp?e%sien8 in bw3kttS indicate the full utterance that war meant by the partial utterance, The llrtaner mu8t flll in this infatrnrtion from the surroundtng dialogQ Thir paper eonridera ruch phrnamma as they occur In trrklerisntrd diJ!kl~g#, BY tr#k=Otl@nted dirtog wt msrn conversation dlrectrd toward the cornplation of soma tark, Xn partieulart wa wglI be eonearnad with r C~?nPUter~baa@d consultant task in which an apprentice t~chniclsn communlcataa with a computer rystsm about tbr rsprir of! dLectromechanleaL devicar, The UndcrStrndlng ryrtem ~urt maintain rWdbi&amp; of the world and ~f the Qialos to 61sa~bl~Uate refaranera In thu agprantlca's rpceeh,</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DIGCOURsE ZN SPEECH UNDERSTANDING
</SectionTitle>
    <Paragraph position="0"> fn r $paeCh undarstandfng ryrtr~, the direourst component is one at revsrrl sources of knowledge that must intctact In interptetlng an utterance (me ?axtan and A, Robinson, 19951 JI RabinSan, 19791r Becaure aL the UncrrtrintY in the ueaurtlc rlgnrl, it is important that higher level raurcar of Rnawladge tika bircousr~ give advice to the ryrtarn at early rtrgar in the rnrlyrir;. For this raaron, in aur current speech ~y~tomr xoutinsr Car identlfylng the referents at d4kfhlta noun Ohtaror ars applied rr #@an rs a pordibla noun phrrrt is idrntlfisd rathat than writing far an Interprrtatlon sf the entire Utt~tance. In estencar the procedure rntrlkr rrtrchlng the recent context to find posrfblc rafer~nta and returning r lfrt of erndidctrc, E11Ipsts and Pronoun tcaolutlon r*UUir@ r more loerg context than the r~aalut~en of nonpranoniniL deftnltc noun phrases CDNPB), A descrtptian of the processinu for elllpsfs and pronoun rerelutton is contrined In the rectlen sDi8caursr Anr~yrir and Pragaatitrn In walker at ale? 1975, In this paper we concentrate on mechanism@ for reaoivlnu DWr, The Problem ot tarolving DHPI Ir bralcallp r problsa of finding a aatchfng StrUCtUte In nemory, Xn the crtt of a canputtr rystem with a rearntic nctrark knowltno~ barer the Problem $8 that of tLndtnp the nttrark tbtr~cture cotrtSPonding to the Btructur~ et the nwn phrase, The node that maps onto the head node of the prtse rtrueturc rtpresenttng the noun phtarr is the Concrpt being ldentltird by the noun phrrea. For rxaaplr, 1f the knowledge but eentrlnr tht nQdar 8hoWa in rLuure 1 (and there are no other node8 wtth (rlraent) or r (ruprrr@t) area to wrcnchtrlt then either nOde W1 at nade W3, but not W2, rSLl Batch the onrsrr Atha b~%-md wran~h~, W~tchin~ is net aLwryr 8tr~lghtfotwrrd. lor CrrmPlr~ conrldcr the rlturtSon Partrayed in Ftgurr 1, Th* adr or Belknrrting elbmant, arc C 8 ecr Hendrlxr i975a) linkr node to delinertfnq Lniorm&amp;tlon about nrmbrr8 of the e~~lr that nod* rrprracntc, El-E Ir r set of</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
&amp;quot;BOX-END WRENCH&amp;quot;
</SectionTitle>
    <Paragraph position="0"> box-end wrancher to which HI balang8, H-L 1s r rat aL h@xm@nd wrrnehar to whtch W2 brlan9s, If the rpprrntlce now rays, u,rr, the box-end wrench% he mtrnr Wlr The uttrrincc level 8trocture created by prrrlng Cree Handrixr 1975W for the Phrrra boxmend wrrnchVg Inslba the space NP In Figure 31 roma deductfan ~ust be dona to e#trbl&amp;Sh the carratpandence between Hi and W), The structure artching taUttneS that for# r brgic part of the DNP terdJvar take as tnputr a parre level network af nOd@S rnd rrcr and e data network to match it against, (The eurrrnt hateher war written by R, E. Flker). Ln general, a Large nUmbar 05 abject8 In the data net may b@ candid~tt~i far the mstshar (irbrr object8 that arc alsRant8 of the #&amp;me rat rr the object being Ebcntlfled bY Zhe DNPL SInca, in itlelf, the mateher has no way of dacldinu whlch obleetr to consider first, additional aechsnismr ate needed to llwit the aerrch,</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FBCWB SPACES
</SectionTitle>
    <Paragraph position="0"> The dlscaut&amp;@ componadt must determlna a rubnct of the semantic net knawirdge bare for eonrideration by the matcher, That Lt nurr be able to rrtablirh a8 a 1oc.L cantrwt that rub8et of thc rystsrer tot11 knowltdgd bur thrt $8 trltvant at a given point in the dialog, Thir 1s anrlogour to detrrmlning what ia in the urerps focus ot rttrntien. Put anofher way, wc Weuld like to hiuhltpht certain node8 and arc8 of the rearntlc nrtwark, In t&amp;skrnorianted dialega, tnr dlrloo ~ontrxt ir actur$ly r camparits 4L three different camponant cantaxtat a Verb41 context, r task context, and r oantext of gan~tal world knowledgal The verbal eontext include8 the hirtory of prbcsdlng utterances, their ryntacfic farm, the object8 and actions ditcurred tn them, rnd the prrticulrs words usad, The talk context Is the fecur ruppli@d by the trrK betng worked bn, It inclub*$ such information art where the Current subtark fltr in the overall plan, what its oubtaakr are, What actiana are likely to tal10W~ What obj~ctr are important, The context of! general world knerledge t&amp; the i,ntormatlan that reflect8 a backgteund undetrtanding of the prapettlal and intearslations sf oblactr and rctionrt tor example, the fact that tool bbxrr typically contain tools and that attaching entail8 Sam@ kind of fastening, To highlight abject8 in the dialog and prb~id~ verbal CUnttxtr network partitioning Ir used in 4 new ~dy. Hendrgx (1975a) ha8 ruggsrted lmporin~ a legleal Battgtioning an nstwarK atructurad for encadlng logical connactlraa and quantlflerr, Using tht srma technique, a tscur partitioning may be ulrd to djvida the network 1 numb- of local contextr, Nodat and rtcr b@lonO to Both lagleal and Xocur apac@r, The logical and focus prrtltions rre tndependrnt of on* another la EhQ #ant@ that the loutcrl spastr on whteh a node or arc lL18 neither detrrmina nor deprnd an thr tocur cprcet In which the nods or arc tlsrc A fiew focus space la created far each subtark that antrre the dialog, he 6 modal (described 8hortly) impoaar a hiararchical otdcllng, based on tha rubtask hiacrrchy, on thrrc spacer, Thir htarrrchy determiner what nods8 and arc8 art vlsiblt from a glvtn space, The arcs and noeer that belong to a $pact age the only ones Immediately vf~lbl@ from that #pace.</Paragraph>
    <Paragraph position="1"> Arcs and node8 in spaces that rrs abave I glven space in the hltrrrchy ate potentially vlriblar but nuct bt requaotrd speefticrlly to be teen, Other arc6 and nodes are not visible, &amp; node may rp~trr in rnY number of focus SPactr, When the raar abject ia used in two eliffarant rubtarke, elthtr the rape or dlffartnt a8ptttr of the abject may be in focur in the two %Ubtr~k%, It Ir al~o porsibLd for r nade Q? &amp;re to be in no focus Spacev Ln this cassr the object 18 not StronglY assoelated with the ictu11 pertoraancr ot any particulrr rubtrgk. Such objsctr must br dtrctibtd relrtivr ts tho global tack envf rvnment For Ce&amp;pLctenartr wc dcflna a tap=&amp;bst gg~ec, called the wcanmunrl rpace5 and a hotto~=mort rprec, callad the *vista sp&amp;cew, The communal $pact contains the 1elatlonrhip8 that are tlee invrrLant {smgor the fact that taolr are founci in teal barer) or earnon to rll eontcxts. The vista space LI below a11 other spaces and hCnce c@n scc tvcrything In the scmrntle net, This psrrpectivr is uscFu1 far drtcrwinlng the r@k.tfonshi~r lnto which an object har rnt~rcd.</Paragraph>
    <Paragraph position="2"> Th* task sadel tn our ayrttm will br cmb~dicd in r procedutrl net which @ncaQcs the trrk ltructure in a hierarchy of 8ubt&amp;sks rnd encoder qach &amp;ubt.rk rr 8 partirl orderinq of rtepr C88crtdotit 197s). The pracedur@&amp; net ryrtam rue rllowr tarxl te be axplndad dynamically to further irvalr ot detail when n@ce18aryI A rrPre8antrtion of the hirratehy of subtarkr 48 hpartrnt for refer~nce ta#olutdan, An ensmlnation ~f task-otiant@d dlr1ogr show8 that rcfctencss operate within tarkr and up tha hlrtarehy chaln (DaUtBch, 1974). UIing the hierarchy of the praer4ural nrt to impor* r hierarchy on the faeus opacar enables us to $*arch for rtfQrenceS in hierarchical order, Havtnp a rrpretrntrtion of the PartlrL ordtrLnp ot task@ allow8 ur to capture the alt@rnatlV@8 the rpprcnttFu has in cheoslng robtrquant tasks, wc have expllcitly #@~lZ'.t@d the thtar camponsntr of the dialog context, The reprrrantatlon of an abject in a tscur apaes will Includ~ only the relatienrhipr that hrve barn mentioned In the dialog conearning the eoracspandinq rubtask or that are ihherant in the pfoeedural net description of ths loerl twg, Thusr the verbal esmPontnt dr ru~Dlisd by Eha lndotmrtian recorded LA the focus space hiarrtehy, Forward rsfotcncsr to objects in the task (task component) are found by axamlning thr Pracedural net, The general world knowledge camPonsnt is indaraetlan that is prtarnt in the communrl space, When rasalvlnq a ONP, ws can dynamically allaeats effort batwean sxrmlntng llnkS In thr &amp;seal Locus rgaee, looking forward In th~ task, looking b~ek up tho facur space hierarchyl and looking daspar into knoWt@dga bare lntatmrtXon.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
GENERAL 8TR1TEGY
</SectionTitle>
    <Paragraph position="0"> the cutrently active LOCUS @pace and thrn to rxrnlnr the next level of detrtl in the task, If the raterent cannot be found tn and then furtner down the tark chain, The currant Ce~tcxt to be U#ed by the dllC0Ur8e ~~OCIIIOI Inctudcaa  (11 A focus space containing the bblrcts currently In focus (2) A link to the crraci&amp;tod node In the tark modal (31 A type flag urQd in ratting up QXp*ctationr,  The type is necessary becru88 there are aubdtrlogs that do not 4irectLy rcflCct on the task StrUctUra, For example, there era rabdfal0~~ about toot Id~ntiftc4tion (wWhrt ir r whaelpullrt?~) rna fool ( WHoW do I ure this wren~h?~). Raftr~ncar in these iubd~rlegr do not fblloq the &amp;@me few# space hierarchy and task The dlrlog Shawn In Tlble 1 Will be examined to show how a ca~#lnrtion of a task model and focur spaces may be ulcd to help I rQUfd like You to arsembla the air coaprasoot, 0.K.</Paragraph>
    <Paragraph position="1"> I rugpert you begin by rtt~chLng the pump to the platform, 0 4, What #re You doing now? using the pliers to gat the nut6 in underncrth the platform, I rWllza thfr is c dlfflcult trrk, f@'m tiqhtaning the baltr now, Theyare all In p2ac+, Goode How tightly should I install this pips sibow that fits into the Pump? Tabla I: OubdlrLqg tor aiteoaprerror rrrembly, A pattirl procedural net for asrambling an air camptensor $8 8hawn in Flgurr 4, The termo winstalp, flconntct&amp;quot; @attachN refw to conceptual a~tlans Pugher than lexlcal :tam#, The darhtd llnar connect higher level tasks to their conrtituent robtark$. The time 8bqU@nC@ of #taps in tho task It left to right, The partial otdaring of te$k# 18 encoded with the S and J nodes. The S1 or ANDBPLIT, node indicatar thr beginning of ParaLltl brrnchro In the partial otdertng, The nods8 on arc8 coming out of an 3 node may be done in any order0 The JI ~r ANDJOIN, nsdc indlerter a paint where several parallel tasks nu8t be completed, The box@# tabelad T are relevant to the subdialog fragment , In the following rnaly8is of the dialogl the utterance6 are ralrtfon to the dialog hirtory and the procedural net task model. (The @@arch for references inside foeun apace8 II currently implementadt Integratlbn with the task model I$ not,) The context intaraat ian listed under (13-(3) above Is shown in the network1 (2) PNETTZE; (31 FSTYPE, E; X would like you to arr@mblt ttao air compgerror.</Paragraph>
    <Paragraph position="2"> kt 0.K.</Paragraph>
    <Paragraph position="3"> Er X ruggcat you begin by attaching the pump ta the plrttotm, [At thL8 Point, we ate at task TI1 tncur spacer F60 and F61 Show Ln Uguto 5 have been gat UP~J CThil coULd mean I'm donel but the trs~snec come) right after thc inrtructtan and the tark trKQu a ~hl1arJ</Paragraph>
    <Paragraph position="5"/>
    <Paragraph position="7"> C: What are you datng new? (After r ruitrblr raitlng prriod, the rxprrt qurrltr the progrers of the usat ,l At Wring the plltrr to get the nuts in un.derntrth thc platforln. fRthe pllersw can be rcrolved becrust thrrr is only on@ pair! it this were bat the Clttr the talk mdrl would hive to be conrultrd. For both *the nutsa and .the plrttotaar the rJ hl@rrrchy it consuJted. @The plat far^^^ Pt fs in facur Ln the current Fa+ There i8 no glun of nuts so re look forward in the task model. The relevant pwtr arc loeat~d in rubtask T4, This CIUles a ntw contextr to be cctablirhed ar shorn in Figure 6.1 Et I terlfte thtr 16 r difficult task, [An attewgt to asreso the rpprrnttee@s petecptlan ot the problew, Note that at thrr point the trrk baa barely begun and the rxpttt docs not have r very goad ~adei of th~ rpprcntfcs.1 At 1% tfghtening the bolts now, They're all tn puce, tFS4 contain, &amp;quot;he baltrRj they were brought Into foeus when TI was gtarted, uThey&amp;quot; ii ddrter~lnrd to refer to 'the baltrVy checking thr obgaefr in tha previaus utterance for number .Ureesentr Note that the Laat rtitamsnt canf~rwr the closure af</Paragraph>
    <Paragraph position="9"> A2 Hew tightly 8hauId Z inrtrll thlr pipe ekb@W that dltr Lnta the pun$? [There is no vJPe @%bow i~ the currant Fa. (Note that UP until that point in the QPWY the rpprentrce ni$ht have bran asking about task TS), We close T5t becruse of the trrk rttusturr this bring8 ur back up to th@ top Ievcl, we art at the point of looklne into naw tcaks, At pramnt all of t terks &amp;re ca%siSered equally, Eventually Tb lr found fO lnvolvs an albow,] In rum~~tionr then@ the sacug 8pacw P~OV~~Q r way aL taalrting certain parts at thr remantic nst, thu8 pt~Yiding e wry to focus on insediately relevant Intorartion, By tylnu the foeur trgk retapenceg, Both the tank mads1 and the facur macar are linked to the general knonladgI brae; th~~r it Is possible to QO from an item bn either the trrk madrl or a Lacul ~paca to other known but not previously rafrrrncrd information &amp;bout that item, Tha focus rpree~ and ta@k model pravida rccrrr to context intormation about objects in the dosaln, maklng it Porrtblr ts DIUtSCh, BltbarP G. The bttu~turC bt Task-Oriented Dlalapt.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> Higher order structures such as ndiscourseH ane &amp;quot;intent'ionfi must be included in any complete theory of language understanding. This paper coEpares two approaches to modeling discourse, The first centers on the conce~t of a wdiscourse grammarn which defines the set of likely (i.e. easily understood) dl scourse structures, A second approach is a tldemand processingtr model in which uttkrances create demands on both the speaker and the hearer.</Paragraph>
    <Paragraph position="1"> Res.ponses to these demands are based on their relative vimportancen. bhe length of time they have been around, and conditions attached to each demand. The flow of responses provides another level of explanation for the discourse strue ture.</Paragraph>
    <Paragraph position="2"> These two approaches are discussed in terms of flexibility, efficiency, and of their role in a more complete theory of discourse understanding, As has been said many times, understanding anything a problenr, an acticn, a word demands some knowledge of the context fn which it appears. Certainly this is true of language, uhere an utterance s meaning may depend upon who the speaker is, when he is talking, what has just been said, who the listeners are, what the Durpose of the conversation is, and so on. It is reasonable to define language understanding as the process of applying contextual knowledge to a sound (or string of symbols) to produce a change in that context. Successful language undkrstanding scaurs whenever the changes in the hearer s context (model of the world) coincides with changes the speaker intended. Of course, stating a problem in a different way does not solve- it, Instead it suggests a series of subsidiary questfons such as:  (1) What is a context? What does it look like? What are its components, its structural characteristics? (2) How doses a new utterance change an existing context? What is the assimilation praceas? What must be kept; what aan de discarded? (3) How does a model of changing eontext account for observed phenomena such as the ability to switch contexts, and to return later (but not too much later)? (4) How does the domain of conversation influence the  structure of a &amp;quot;contextH? Do different nechanisms operate when the subject natter is tightly constrained? It may be quite a while before questions of this type can be answered fully, This paper is a discussion of some of the issues and of the characZeristies required of a solution. To do this, we will exmine two classes of discourse models which have been proposed. The first is a ndiscourse grammarw approach which attempts to define the set of likely sequences of utterances. the second is a &amp;quot;demand processort&amp;quot;pproach which attempts to account for utterances as responses to internal demands. Before discussing the specifics of these models, a sample dialogue, from the nttch restricted world of man-machine conrnunfcation, is presented and discussed in Section 2,</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="22" type="metho">
    <SectionTitle>
2. - An Examole from Travel Budaet Management
</SectionTitle>
    <Paragraph position="0"> The proble~l of discourse understanding is complex enough that a complete analysis is certainly premature. We will examine h-ere a micro-discourse, restricted to two conversants, one of them a machine, and, a conversation restricted in scope to that of managing a travel budget. Imagine that a person responsible for a traveJ budget is trying to decide how many people to send to an upcoming conference. He has available a computer system which can answer questions about the budget, conferences, trips, and so on. In talking to such a system, the travel budget  manager is engaging in an admittedly simple dialogue, but one which nevertheless exhibits many interesting properties of discourse understanding, (The following dialogue is  hypothetical; however a system of this type is being developed at BBN (Woods, fi a (1974, 1975))). The (human) manager begins: (1) WHEN IS THE NEXT ASSP CONFERENCE? May 15-18, 1975 (2) IS THAT IN OUR CURRENT BUDGET YEAR? Yes.</Paragraph>
    <Paragraph position="1"> (3) HOW MUCH MONEY DO WE HAVE LEFT? $316.41 remains in the bud~et.</Paragraph>
    <Paragraph position="2"> (4) WHAT IS THE COST OF THE ASSP CONFERENCE?  The registration fee is $25. Would you like me to estimate the cost of a trip to the conference?</Paragraph>
  </Section>
  <Section position="10" start_page="22" end_page="30" type="metho">
    <SectionTitle>
(5) YES. FOR THREE PEOPLE.
</SectionTitle>
    <Paragraph position="0"> Assuming they travel by air . . .</Paragraph>
    <Paragraph position="1"> Assuming each person spends three days in Philadelphia . . .</Paragraph>
    <Paragraph position="2"> Assuming a per day expense of $25 . . . The c03t would~be $371.50.</Paragraph>
    <Paragraph position="3">  (6) WILL THE BUDGET COVER THAT? No, the deficit would be $55.09. (-7) ASSUME THEY GO BY BUS.</Paragraph>
    <Paragraph position="4">  The cost would be $327.40. The budget deficit would be $10.99.</Paragraph>
    <Paragraph position="5"> 2 3 (8) OK. SCHEDULE A TRIP FOR THREE PEOPLE EY EUS ?C THE AECP MEETING.</Paragraph>
    <Paragraph position="6"> Done. That leaves $ -10.99 in the budget. Even in a simple dialogue like the one abcve we see important discourse phenomena. For exanple, the cacager's isput (4) is ambiguous to the system. &amp;quot;Cost&amp;quot; could near; &amp;quot;segistratim feeu or rlcost of a trip to the ecnferencen. The syste~ hzs t~ ask about this ambiguity, producing arz erzbedded questi~n-ansxer pair within the criginal question-answer pair (cf. Schegloff s (1972) &amp;quot;insertion sequencesr1). However, there is sore than a Q-1, Q-2, A-2, A-1 pattern in (4) and (5). The system does ncit just say, &amp;quot;Did you mean fee cost or 'trip cost'?&amp;quot;. Assucing that &amp;quot;fee costn is a reasonable interpretation, and useful to know besides, it simply states that fact, and asks about the other interpretation, which would demand more zcnputation. b. discourse model should account for 'chis apparent awareness of co'tnputational difficulty, which is exhibited in human conversation and between a human and our idealized machine above. Another phenomenon worth noting in this dialogue is the variation in detail and precision anon@ the utterances. Sentence (8) is fairly precise and complete. Since alternatives have been considered to the trip he has decided upon it is important.  stress those aspects of the trip - &amp;quot;three people&amp;quot;, Ifby bus which have been in question. On the other hand, sentence (3) is clearly elliptical. Tbis is all right since the question is merely exploratory. Furthermore, the previous question insures that &amp;quot;money . .. leftv refers to money in the current budget. An adequate discourse model should account as well for our apparent ability to accommodate for the speech channel capac9ity, to minimize transmission errors through the use of redundancy and stress, and in genwal to attempt to optimize the communication. One way to account for these and related phenomena .is to postulate a discourse grammar. The grammar might say that part of a dialogue is a &amp;quot;question-answern pair, and that it may be recursive in the sense that question-answer pairs may be embedded within it. This approach is discussed in the next section. A contrasting approach is to say that each utterance produces ndemandsfT in the heads of the listeners. Responses to these demands may take the form of subsequent utterances. This latter model is discussed in Section 4.</Paragraph>
    <Paragraph position="7"> Upon reading a dialogue like the example in Section 2, most of us readily form an opinion about its structure, In any dialogue we see this kind of structure: one person is asking another to do something; two people are arguing about politics, or discussing a novel. There is almost always a structure higher 2 5 than the individual sentences, In the example cf Seetim 2, the travel budget manager seens to be entering into a &amp;quot;scneduie a tripu dialogue. His question abaut, a future cor,ferer.ce is one of the cues to a bundle of infornation known by both hi^ arid the system about scheduling trips. Such a bundle has been varicusly referred to as a &amp;quot;fraroeV (IJiinsky (19751, Winograd ! 1?75)), a trscriptrr (Abelson ( 1975), Schank and Abelson ( 1975) ) , a rrtbenev (Phillips (1975)), a &amp;quot;story schemar1 (Rumelhart (1975)), ar.d a ttsocial action paradigmF1 (Bruce ( 1975a, 1975b) ) .</Paragraph>
    <Paragraph position="8"> The information associated with scheduling a trip includes facts about dates and times, about the budget, about travel, about conferences, and so on. It also includes ttplanstr, that is, tiwe ordered structures of beliefs about achieving &amp;quot;goals&amp;quot;. In this case, the goal is scheduling a trip to a conference. (See  also Bruce and Schmidt (1974), Schmidt (1975)). Cne such partially instantiated plan might be 1. Find out to which budget the trip should belong.</Paragraph>
    <Paragraph position="9"> 2. Determine how much is in the budget (budget).</Paragraph>
    <Paragraph position="10"> 3. Figure the cost of the trig (tripcost).</Paragraph>
    <Paragraph position="11"> q. Decide whether (budget - tripcost) is acceptable.</Paragraph>
    <Paragraph position="12"> 5. If acceptable, schedule the trib and stop.</Paragraph>
    <Paragraph position="13"> 6. If not acceptable, determine if trip can be' modified to be cheaper.</Paragraph>
    <Paragraph position="14"> a. If modifiable, go to 3.</Paragraph>
    <Paragraph position="15"> b. If not modifiable, stop.</Paragraph>
    <Paragraph position="16">  The steps (1 - 6) above are ordered, though nothing is said about their relative lengths. Also, there are variants on the plan where the order might be changed, e.g. step 3 might come before step 2 in some other plan. The structu~e of such a plan, coupled with the by now commonplace observation that a discourse is structured, leada to the natural idea of representing a discourse by a grammar, Such a grammar may be large; it may be probabilistic; it may apply in only limited domains.</Paragraph>
    <Paragraph position="17"> Nevertheless it does give some idea of what to expect in a dialogue and may play a central role in language comprehension. A portion of the grammar for our example dialogue is shown in Figure I. This is an Augmented Transition Network Network (ATN) in which the arcs nay refer to other networks (PUSH arcs), may signify direct transitions to other states (JUMP arcs), or may signify conclusion of the path (POP arcs). For example, in addition to this &amp;quot;SCHEDULE&amp;quot; network there is an network wherein the manager describes a new trip to be entered and the system asks him questpons to complete the descriptian.</Paragraph>
    <Paragraph position="18"> Fig. 1. ATN for scheduling a trip.</Paragraph>
    <Paragraph position="19"> A discourse or dialogue grammar can be used with a modified ATN parser to ttparsetl a dialogue, generating both analyses of the current utterance and predictions about the one to come. Tn fact, one such modified parser and grammar has been implemen$ted for the BBN speech system (~ruce(1975c), Woods, &amp; (1975)). For many dialogues, the grammar applies quite well, testing for the head verb in the utterance, the mood, and checking presuppositions of the action implied. When successful, it makes  corresponding predictions for application to the next utterance. Unfortunately, when the grammar fails it is not very good at recovering from its error, Discourse grammars seem to be most effective in tightly constrained domains, more for instance in a discussion about how to coak a turkey, where there are specific subproblems to analyze, than in the travel budget management domain, and less still in a general question answering context. (Cf Deutsch ( 1974, 1975) 1.</Paragraph>
    <Paragraph position="20"> Lest it be thought that discourse parsing is just sentence parsing for !'big sentencesw, I should emphasize some of the differences, differences which some would say preclude the use of terms like &amp;quot;grammaru ,, llATN1', and llparsing'l, First, discourse parsing ppoceeds in a mode of partial parse, then output, then partial parse, etc. In other words, the goal is to derive information from the partial discourse which has occurred to suggest what may follow and to explicate the role of the current utterance. The parse is never completed, no structure is built. Since the entire discourse is not available to the parser (as the entire sentence is to a sentence parser), it is necessartly probabilistic One can never know how the next utterance may alter the current interpretation of the trend of the dialogue. Another important difference is that PUSE'S and POP'S in the discourse grammar are wsloppyv. That is, the participants in a dialogue may descend several levels (&amp;quot;Before you finish, let me  ''Before that . . .&amp;quot;) and never lfpop&amp;quot; back up tell you about ... , to the original level of the discourse. A discourse parser is faced with the peculiar phenomenon that a PUSH usually implies a POP but not always.</Paragraph>
    <Paragraph position="21"> Some, but not all of these oddities of a discourse grammar are resolved by an approach which emphasizes internal models of the speaker and the listeners. This approach is discussed in the next section.</Paragraph>
    <Section position="1" start_page="29" end_page="30" type="sub_section">
      <SectionTitle>
Demand Discourse
</SectionTitle>
      <Paragraph position="0"> One obvious characteristic of a discourse is that many processes may be occurring at once. A person cannot, nor does he wish to respond at one time to all unanswered questions; extend each unfinished line of thought, or deal with every inconsistency. While a grammar may predict the most likely action for a given point in a dialogue, it is not very good at suggesting alternatives out of the main line. There appears to be an additional mechanism of roughly the following form: An event in a discourse (or prior to it) sets up a number of internal demands. Examples of such demands are to confirm what was said, explore its consequences, dispute it, answer it, etc.</Paragraph>
      <Paragraph position="1"> For any given event (such as an =utterance) the~e may be none, one, or many demands created. A person's own action may place demands upon himself. If X asks a question of Y, then Y normally establishes an internal demand to answer the question. But X may  also establish a demand of the form, l1check to see if the question has been answered1!. This latter demand may generate a later utterance such as, &amp;quot;Why haven t you answered me?&amp;quot;. Simple demand models already exist in a few systems. In general, they suggest that utterances are produced in response to conditions in the (internal model of the) environment rather than as units in a larger linguistic form. (See also Stansfield (1975)). It would be premature to argue that either a demand model or a grammar model is sufficient by itself, Instead, what follows is simply a description of a demand model for the travel budget management domain mentioned above.</Paragraph>
      <Paragraph position="2"> Internal demands on the travel budget system help to explain how one computation of a response can be pushed down, while a whole dialogue takes place to obtain missing information, and how a c~mputation can spawn subsequent expectations or digressions.</Paragraph>
      <Paragraph position="3"> Associated with each demand is a priority, a pointer (pur~ose) to the demand which spawned this one (if any), and a time marker indicating how long the demand has been around. An active unanswered question is a typical demand with high priority.</Paragraph>
      <Paragraph position="4"> Demands of lower priority include such things as a notice by the system that the manager is over his budget. Such a notice might not be communicated until after direct questions had been answered. The fact that some questions cannot be answered without more informatia~ leads to the  kind of embedding which is typically represented in a discourse grammar by a PUSH to a ltclarificationU state.</Paragraph>
      <Paragraph position="5"> Counter-demands are questions the system has explicitly or implicitly asked the user. While it should not hold on to these as long as it does to demands, nor expect too strongly that they will be met, the system can reasonably expect that most counter-demands will be resolved in some way. This is an additional influence on the discourse structure.</Paragraph>
      <Paragraph position="6"> A demand model also includes a representation of the current topic, the active focus of attention in the dialogue. For the travel budget system, it could be the actual budget, a hypothetical budget, a particular trzi-p, or a conference. The current topic is used as an anchor point for resolving references and deciding how much detail to give in responses. Again, this structure leads to certain modes of interaction. For example, if the manager says &amp;quot;Enter a trip,&amp;quot; the system notes that the current topic has changed to an incompletely described trip.</Paragraph>
      <Paragraph position="7"> This results in demands that cause standard fill-in questions to be asked. If the manager wants to complete the trip description later, then the completion of the trip description becomes a low priority demand,</Paragraph>
    </Section>
  </Section>
  <Section position="11" start_page="30" end_page="33" type="metho">
    <SectionTitle>
5. Synthesis?
</SectionTitle>
    <Paragraph position="0"> Discourse has been. an object of' study for many both in and out of the field of computational Linguistics. Especially worth noting is the work of sociolinguists such as Labov (1972), Sacks, Schegloff, and Jefferson (19751, and Schegloff (1972). Linguists (e.g. Grimes), sociologists (e.g. Gof fman (1971)), and philosophers (e.g. Austin (1962), Searle (1969)) have important direct or related contributions. I certainly can't presume in this short paper tan ~ive the definitive solution to all the problems revolving around the discourse question. What I have tried to do is to emphasize a distinction in approach between looking at a discourse as a linguistic whole with subparts being individual utterances, and as a side effect of responses to task demands .</Paragraph>
    <Paragraph position="1"> Both approaches are useful in exemplifying ways in which the otherwise hazy area of discourse might be modeled. The grammar approach makes the strongest statement about actual discourse structure and can best be used where the structure is well known or can be tightly constrained, e.g. in generating a discourse or in a man-machine system where the computer imposes control on the dialogue. A grammar and a discourse parser can be very efficient in such situations. When the dialogue is less predictable the (more bottom-up) demand prooessing approach may be more resistant  to vtsurprisesvl in the dialogue.</Paragraph>
    <Paragraph position="2"> The ultimate discourse model probably contains aspects of both goal-directed grammars and of localized responses to demands. What should be particularly interesting to see is how characteristics of the model are affected by the type of discourse, human-machine v. human-human, problem-oriented v.</Paragraph>
    <Paragraph position="3"> information-exchanging, or new domain v, old.</Paragraph>
  </Section>
  <Section position="12" start_page="33" end_page="33" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> The component propositions of a coherent discourse exhibit anaphoric, spatio-temporal, causal and thematic structures. Not all of this structure is explicit, but must be inferred using a model of cognitive knowledge. The organization of knowledge in the model allows a bottom-up analysis of discourse. Furmer, knowledge is formed into small complexes rather than into the large monolithic structures found in Scripts/Frames.</Paragraph>
    <Paragraph position="1">  1. The Structure of Coherent Discourse.</Paragraph>
    <Paragraph position="2"> - - null A discourse is judged coherent if its constituent propo'sitions are connected. Various types of cohesive links are observed in discourse: anaphoric, watial, temporal, causal and thematic. We will formally describe the structure of a well-formed discourse in terns of these connectives.</Paragraph>
    <Section position="1" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
1.1 Anaphora.
</SectionTitle>
      <Paragraph position="0"> Two kinds of anaphora can be distinguished. The first is marked by the presence of a profom (or by the repetition of a form) : (1) Henry travels too much. He is getting a foreign accent.</Paragraph>
      <Paragraph position="1"> Antecedents may be nominal, verbal or clausal.</Paragraph>
      <Paragraph position="2"> The second kind of anaphora has a dependent that is an abstract term for the antecedent. For example, (2) John put the car into 'reverse' instead of 'drive' and hit a wall. The mistake cost him $200 in repairs.</Paragraph>
      <Paragraph position="3"> 'Mistake1 in (2) is an abstract characterization of the gear selection expressed in the first sentence.</Paragraph>
      <Paragraph position="4"> A conventional way to label the recurring actors in discourse is as Idramatis personae'. However cohesion can result not only from multiple appearances of people, but of any concept, as in (2).</Paragraph>
    </Section>
    <Section position="2" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
1.2 Spatio-temporal and Causal Cohnectives.
</SectionTitle>
      <Paragraph position="0"> Space, time and cause give coherency to a set of propositions.</Paragraph>
      <Paragraph position="1">  (3) The King was in the counting house, counting out his money. The Queen was in the parlour, eating bread and honey.</Paragraph>
      <Paragraph position="2"> The actions in (3) are set in different rooms, but of the same 'palace1. (4) After Richard talked to the reporter, he went to lunch.  The temporal semence of events in (4) is expressed by 'after1. (5) John eats garlic. Martha avoids him.</Paragraph>
      <Paragraph position="3"> To non-aficionados garlic is known only for its aroma, detection of ~hich causes evasive action.</Paragraph>
      <Paragraph position="4"> Cause, illustrated in (5) is an important discourse connective. Note however, that this is an ethnocentric view; in akher cultures a different position may have to be taken, for example, a teleological world view (White : 1975) .</Paragraph>
      <Paragraph position="5"> !Ibis dimension of discourse skructure is termed its 'plot' structure. 3.. 3 Thematicity.</Paragraph>
      <Paragraph position="6"> Discourse is expected to have a theme, to have a topic. For example,</Paragraph>
    </Section>
  </Section>
  <Section position="13" start_page="33" end_page="33" type="metho">
    <SectionTitle>
6 Dino Frances drowned today in Middle Branch Resevoir
</SectionTitle>
    <Paragraph position="0"> after rescuing his son Dino Jr. who had fallen into the water while on a fishing trip.</Paragraph>
    <Paragraph position="1"> is a new story from the New York Times, with a theme of, say, 'tragedy'. Discourse may have more than one theme, but these should not conflict. (7) Eating the fish made Gerry sick. He had measles in May.</Paragraph>
    <Paragraph position="2"> In (7) we have an incoherent structure. The proposition 'Gerry sick' belongs both to a topic 'food-poisoning' and to a biography of illnesses. The analysis of fairy-tales by Lakoff (1972) suggests that discourse has a strictly tree-like thematic organization.</Paragraph>
    <Paragraph position="3"> It is concluded that the propositions of a coherent discourse are connected either by coreference Or (preferably) causally, and that it has a single theme (which may be the root of a tree of themes).</Paragraph>
    <Paragraph position="4"> 2, The Role of Inference.</Paragraph>
    <Paragraph position="5"> Not all of discourse structure is overtly stated; discourse is highly elliptic. In (4) the discourse connective 'after' is present to mark a temporal sequence, but in (5) there is no realization of the causal relation between the two propositions. Normally one assumes that a discourse is coherent; hence (3) is most acceptable if the rooms are taken as being within the same habitation. Evidently a reader must infer omitted structure. The inferences axe made from his cognitive store of world knowledge.</Paragraph>
    <Paragraph position="6"> There is much discussion at present about inference as part of understanding. To make inferences is easy; the problem is to make the right ones. It helps to have a goal. It is suggested that discourse can be said to be understood when it has been judged coherent, as defined above.  3. Mechanisms of Inference.</Paragraph>
    <Paragraph position="7">  A model of cognitive knowledge -- an encyclopedia -- should be capable of making the inferences necessary to form an opinion about the coherency of a discourse. The present encyclopedia originated with Hays (1973); a fuller description can be found in Phillips (1975). It is implemented as a directed graph. Labeled nodes characterize concepts and labele'd arcs relations between concepts.</Paragraph>
    <Paragraph position="8"> Propositions have a structure of case-related concepts, based on Fillmore (1989). This is our 'syntagmatic' organization of knowledge. AS propositions are essentially the building blocks of discourse, we will not dwell on their structure here.</Paragraph>
    <Section position="1" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
3.1 Anaphora.
</SectionTitle>
      <Paragraph position="0"> If the dependent is a profom then part of understanding is to determine the correct antecedent. There are syntactic constraints (Langacker: 1969) which serve to narrow down choices for anteaedents and to give an order of preference. The chosen antecedent will be the first that, when substituted for the proform, produces a meaningful proposition that is coherent in context.</Paragraph>
      <Paragraph position="1"> A meaningful proposition is one that has a counterpart in the encyclopedia. The counterpart may be the self-same proposition, or more likely. a generalized proposition (hereafter a GP]. For example, rather than 'Joan drink milk', we would expect to find 'animal imbibe liquid'.</Paragraph>
      <Paragraph position="2"> How are GPs found? All concepts belong to partially ordered taxonomic structures in the encyg-edia (our 'paradigmatic' organization of concepts). From any concept it is possible to follow paradigmatic relations to a more general concept, which may be a constituent of a proposition, An intersection of paradigmatic paws originating from each concept in a discourse proposition (hereafter a DP), taking account~fsyntagmatic structure, gives a GP. If there is no such intersection, then the Dl? is not consistent with encyclopedic knowledge.</Paragraph>
      <Paragraph position="3"> Abstract terms can be defined by complexes of GPs, each having sufficient conceptual content to define situations in which they apply. For example, a definitionof 'mistake' must be such that it applies to part of the first sentence in (2) .</Paragraph>
    </Section>
    <Section position="2" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
3.2 Space, Time and Cause.
</SectionTitle>
      <Paragraph position="0"> To infer oxni-tted spatio-temporal and causal relations (temed 'discursive' relations in the encyclopedia), it is also necessary to locate GPs. The encyclopedia, of course, includes these relations, but between GPs. ~chematically, from a discourse proposition P we can  locate P a GP, in the manner outlined above. P may have a discursive 2' 2 relation R to another GP,  . A proposition P a particularized version 4' of Pj, and the relation R, between P and P can be added to the</Paragraph>
      <Paragraph position="2"> discourse, figure 1.</Paragraph>
    </Section>
  </Section>
  <Section position="14" start_page="33" end_page="33" type="metho">
    <SectionTitle>
DISCOURSE
</SectionTitle>
    <Paragraph position="0"> Often P4 will be a propasition already stated in the discc~nzoe; merely the relation need be inferred to augment the plot structure, It may, however, be necessary to, infer a chain of propositions to link the original DPs. The question arises whether there is a limit on the number of propositidns in a 'sensible' inferred path, Intuitively there is, but at present we have no formal insight.</Paragraph>
    <Section position="1" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
3.3 Thematicity.
</SectionTitle>
      <Paragraph position="0"> A theme is a complex of GPs, structurally indistinguishable from that used in characterizing abstract terms like 'mistake1, The potential presence of a theme is detected in the process of seeking GPs for DPs.</Paragraph>
      <Paragraph position="1"> All GPs, whether or not they are part of a thematic definition, can be located by paradigmatic searches; some GPs have additional structure indicating that they are components of themes. Tt is not sufficient to establi~h a theme for discourse by separately finding DPs that correspond to all the GPs of a theme. The thematic definition and the relevant ?art of the discourse must be tested holistically to ensure that the correct coreferentialities exist among the propositions,</Paragraph>
      <Paragraph position="3"> There are two basic processes underlying inference. First there is the process of locating a GP given a DP. This is implemented essentially by a breadth-first search through the paradigmat~c structure of the encyclopedia. Secondly there is the process of matching a complex of proposil5ons in discourse against an encyclopedic complex. The latter process is qualitatively different as it involves tests for co-reference that the former does not.</Paragraph>
      <Paragraph position="4"> Complexes of propositions have obvious functiond similarities with 'Paraplates ' (Wilks : 1975) , 'Scripts ' (Schank and Abelson : 1975) and 'Frames1 (Minsky: 1975). Adding to the expanding terminology, our version known ' me talingual definitions'.</Paragraph>
      <Paragraph position="5"> Metalingual definitions serve to define abstract terms (Imistake1), themes (' tragedy1 ) and plans (used by Furugori (1974) in his robot planner). The distinctions are more terminological than substantive, their functions are interchangable; irl other cantexts a plan could be a theme, a theme an abstract term, etc.</Paragraph>
      <Paragraph position="6"> When an abstract concept has a metalingual definition, a matching discourse may be rewritten in terms of that concept. For example, 'buy' has such a definition, say 'person gives object to pexson , person  to concepts in its definition. A proposition produced by abstraction is structurally indistinguishable from a proposition that was in the iginal discourse, and can be subject encyclopedic process, including further abstraction. Conversely, if a proposition contains concept having a metalingual definition, then the proposition can be decomposed into a complex of propositions patterned on the definition,</Paragraph>
    </Section>
  </Section>
  <Section position="15" start_page="33" end_page="33" type="metho">
    <SectionTitle>
4. An Example.
</SectionTitle>
    <Paragraph position="0"> A schematic analysis of (6) shows the inference system in opexation, resultins in a structure that satisfies the criteria of coherence, At each step we will indicate the encyclopedic knowledge used in the inference, and the current state of the discourse. The original discourse propositions are indicated by 0 and inferred propositions  falls in j wed to act Conjunction is indicated by Part-whole relations. Note that a link to one of the original propositions has been established.</Paragraph>
    <Paragraph position="1">  ---------------------------------------------------------------------Step 4. To rescue someone who is in water it may be necessary to be in water.</Paragraph>
    <Paragraph position="2">  A link to the final proposition of the discourse is made. Coreferentiality conditions prevent 'son in water1 and 'Father not able to act1 conjoining to satisfy the conditions on this i-nference.</Paragraph>
    <Paragraph position="3"> Note that the antecedent condition on this inference is the same as at step 3.</Paragraph>
    <Paragraph position="4"> Both resultant situations are possible, and axe noted.</Paragraph>
    <Paragraph position="5"> The system can select either.</Paragraph>
    <Paragraph position="6"> However, the wrong choice does not lead to a connected structure, and a back up to the alternative has to be made.</Paragraph>
    <Paragraph position="7"> The discourse now has an inferred causal structure connecting all the original propositions.</Paragraph>
    <Paragraph position="8"> From a thematic analysis of drowning stories in general (Phillips: 1975), the common theme can be described as 'giving a cause for the person being in ae water, and giving a cause for the victim not being able to act (thereby not being able to save himself)'. This theme fits the discourse by virtue of propositions a and 0, which stand in causal relations to 'being in the water' and 'not able to act' for the victim, The theme 'tragedy' is defined as 'someone does something good and dies as a result of this actiont. The father's rescue of his son and subsequent demise satisfy this theme (@ and ) For the story to be coherent, these themes must not overlap; in fact we see that the l drowning' theme is properly contained by ' tragedyt .</Paragraph>
  </Section>
  <Section position="16" start_page="33" end_page="33" type="metho">
    <SectionTitle>
5. Discussion.
</SectionTitle>
    <Paragraph position="0"> The analysis is so organized that the themes are determined in a bottom up manner, as are all generalized facts used in the analysis.</Paragraph>
    <Paragraph position="1"> Though not presently implemented, it should be possible to use potential themes, ones for which only some component propositions have been found, in a predictive manner, The complexes of propos'i tions , in rnetalingual definitions of themes and elsewhere, are really not that complex. The ones in the example contain only a few propositions. Each has only the essentials of the situation. The final structure arises from many small pieces of knowledge rather than from one monolithic aggregate. This seems to be a more natural organization, as each ~f the simpler structures can be freely applied in many contexts, rather than being bound to one situation. The discourse judgement is relative to the knowledge of the hearer.</Paragraph>
    <Paragraph position="2"> Whether the inferences are those intended by the author is another question. Ideally they should ber or differences should be unimportant.</Paragraph>
    <Paragraph position="3"> A misleading inference indicates poor writing by the author; he has misjudged the knowledge of his audience.</Paragraph>
    <Paragraph position="4"> Directing inferences on a discourse towards the goal of judging it coherent provides a normalized version of the discourse, if the process is successful. The normalized structure can form the basis for further proc~ssing: content analyis, stylistic analysis, etc. It may also provoke various questions, for example, we could ask if the inferences were correct; we have the 'rescue' situation applying to the father, but he wasn t rescued, why not,</Paragraph>
  </Section>
  <Section position="17" start_page="33" end_page="33" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> In understanding stories or natural-language discourse, hearers draw upon an enormous base of shared world knowledge about common situations like going to restaurants, theaters or supermarkets to help establish the needed context. his paper presents an approach to the management of this type of knowledge based upon the concept of a situational script [Schank and Abelson, 19751. The application of scripts - in story understanding is illustrated via a computer model called SAM (Script Applier Mechanism) .</Paragraph>
    <Paragraph position="1"> In simple one-script stories, SAM constructs a trace through a preformed data structure containing the input, other events not mentioned but commonly assumed, the important The research described in this paper was supported in part by the Advanced Research Projects Agency of the Department of Defer.se and monitored under the Office of Naval Research under contract N00014-75-C-1111.</Paragraph>
    <Paragraph position="2"> inferences asmciated with the events, and the interconnecting causal links. In more complicated stories, SAM handles the invocation and closing of parallel, nested and sequential scripts.</Paragraph>
  </Section>
  <Section position="18" start_page="33" end_page="33" type="metho">
    <SectionTitle>
1.0 Introduction
</SectionTitle>
    <Paragraph position="0"> Natural-language processing research in recent years has increasingly focussed upon the modeling of human world knowledge and management of the resulting data base (I). This has come about largely because of the enormous problems encountered in the processing of texts, as opposed to single sentences, by traditional methods based upon syntactic analysis and low-level semantics, This state of affairs should not be surprising, since it is quite clear that people draw upon a huge store of shared, extra-linguistic world knowledge in understanding even the simplest stories or eng ag ing in the most rudimentary conversation, Much of the knowledge that hearers utilize to establish the background or context of a story appears to be episodic in hature, distilled from many experiences in common situations like going to restaurants, football games and supermarkets. This paper presents an approach to the representation and handling of this type of mundane world-knowledge based upon the concept of a situational script [Schank and Abelson, 19753. The application  (1) See, for example, the emphasis on this area in h he ore tical Issues in Natural Language Processing&amp;quot;, Proceedings of the Interdisciplinary workshop in Computational Linguistics, 10-13 June 1975, Cambridge, Mass.</Paragraph>
    <Paragraph position="1"> of scripts in story understanding will be illustrated via a  computer model called SAM (Script Appl ier Mechanism) devised for a PDP-10 computer at Yale University.</Paragraph>
    <Section position="1" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
1.1 Examples of Story Understanding in SAM
</SectionTitle>
      <Paragraph position="0"> Before attacking the various aspects of scripts and the particular activities carried. on by SAM'S script applier, it seems appropriate to give some examples of input and the various outputs that SAM produces. This will give the reader some feeling for the kinds of heavily script-based stories SAM can handle and the understanding tasks it performs.</Paragraph>
      <Paragraph position="1"> We present three different types of story input. The first story makes references to a common script , the restaurant script , then stops the script in midstream: Wnt to a restaurant. He sat down. He got mad.</Paragraph>
      <Paragraph position="2">  He left.</Paragraph>
      <Paragraph position="3"> A 'long paraphrase' output generated by SAM for this story is as follows : John was hungry. He decided to go to a restaurant. He went to one, He sat down in a chair. A waiter did not go to the table. John became upset. He decided he was going to leave the restaurant. He left it.</Paragraph>
      <Paragraph position="4"> Next is a typical boring restaurant story conforming closely to expected scriptal data: Story I1 John went to a restaurant. The hostess seated John. The hostess gave John a menu. John ordered a lobster. He was served quickly. He left a large tip. He left the restaurant.</Paragraph>
      <Paragraph position="5"> SAM has generated summary and question-answer ing output for this story: Summary John went to a restaurant and he ate a lobs,ter, Q: Why did the hostess give John a menu? A: So John could ask the waiter for a meal, Q: Why did John go to a restaurant? A: So John could eat a meal, Q: Did the waiter give John a menu? A: John got the menu from the hostess.</Paragraph>
      <Paragraph position="6"> Q: Why did John leave the waiter a large tip? A: Probably John remembered the waiter served a lobster to him quickly.</Paragraph>
      <Paragraph position="7"> Q: Di,d John pay the bill? A: John paid the check.</Paragraph>
      <Paragraph position="8"> The system has also provided long-paraphrase outputs for this story, not shown here, in both English and Chinese. The final example invokes several scripts, and calls up an unusual path in one script because of an odd occurrence in an earlier one: %%%% to New York by bus. On the bus he talked to an old lady. When he left the bus, he thanked the driver. He took the subway to ~eone's. On the subway his pocket was picked. He got off the train and entered ~eone's. He had some lasagna. When the check came, he discovered he couldn't pay. The management told him he would have to wash dishes.</Paragraph>
    </Section>
    <Section position="2" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
Long Paraphrase
</SectionTitle>
      <Paragraph position="0"> John went to a bus stop. He waited at it a few minutes , He entered- a bus. The driver got the ticket from John, He went to a seat. He sat down in it. While John was on the bus an old lady and John talked. Phe dtiver took John to New York. He went to the driver. while getting off the bus John thanked the driver. John got off it.</Paragraph>
      <Paragraph position="1"> He enteted a station. He put a token in the turnstile. He went to the platform. He waited at it a few minutes. He enteted a subway car. A thief went to John. The thief picked ~ohn's pocket. He went, John went to the seat. He sat down in it.</Paragraph>
      <Paragraph position="2"> The driver took John to ~eone's. He left the subway cat, He left the station.</Paragraph>
      <Paragraph position="3"> He entered ~eone's. He looked around inside it.</Paragraph>
      <Paragraph position="4"> He saw he could go to a table. He went to it. He sat down in the seat. He ordered some lasagna. The waiter indicated to the chef John would like him to prepare something. The chef prepared the lasagna.</Paragraph>
      <Paragraph position="5"> The waiter got it from the chef, The waiter went to the table, He served the lasagna to John. He ate it. He became full.</Paragraph>
      <Paragraph position="6"> He asked the waiter for the check. John got it from the waiter. John read the check. John discovered he was unable to pay the check. He indicated to the waiter he wayunable to pay the check. The management told John he would have to wash dishes. He entered the kitchen. He washed dishes. He left ~eone's.</Paragraph>
      <Paragraph position="7"> [paragraphing has been added to the computer output for ease of reading] In these example stories, SAM analyzes each input sentence into a Conceptual Dependency (CD) represen tat ion. If this representation fits a script, that script is called into memory and successive inputs are matched in the script and linked up by a SAM program called the script applier. The script applier output is processed by other SAM programs depending on the type of final output desired, and English or, for Story 11, Chinese is generated, The point to be stressed is that all the 'understanding' processing is done on a single data structure, the story representation constructed by the script applier. We discuss in particular the scriptal data base, the script applier and the story representation in succeeding sections. Additional details on the other parts of SAM can be found in [Schank et al, C_ _L 19753.</Paragraph>
    </Section>
    <Section position="3" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
2.0 Situational Scripts
</SectionTitle>
      <Paragraph position="0"> As implemented in SAM, a situational script is a network of CD patterns describing the major paths and turning points commonly understood by middle-class ~mericans to occur in stereotyped activitieS such as going to theaters, restaurants and supermarkets. The script idea is very similar to the independently developed 'fr me system' for story understanding described in [Charniak, 19751 , which is itself based loosely on the 'PS1 ame' concept [Minsky, 19741 currently4 used in vision research.</Paragraph>
      <Paragraph position="1"> The patterns provided in scripts are of two general kinds: events, which we will construe broadly as including states and state-changes (2) as well as mental and physical ACTS; and carnal relatkons among these events [Schank, 1973 and 19741.</Paragraph>
      <Paragraph position="2"> (2) Certain actions like driving a car or preparing food involve complex, learned sensory-motor skills as well as scr iptal knowledge. Such actions are summarized within a script as a causal relation terminating in the chief state-change effected by the action. For example, the sentence &amp;quot;The cook prepared the mealn is represented in LISP CD format as:</Paragraph>
      <Paragraph position="4"> Patterns are used in scripts not only because of the variety of possible fillers for the roles in scripts, but also to constrain the amount of information needed to identify a story input.</Paragraph>
      <Paragraph position="5"> Thus, far example, the script provides a LISP CD template like:</Paragraph>
      <Paragraph position="7"> to identify inputs like: John went into ~eone's.</Paragraph>
      <Paragraph position="8"> John walked into Leone's.</Paragraph>
      <Paragraph position="9"> John came into ~eone's from the subway. (X and RESTAURANT are dummy varisables). his allows the script applier to ignore inessential features of an input (like the Instrument of the underlying ACT or the place John came from in the examples given above) , and thus provides a crude beginning for a theory of forgetting.</Paragraph>
      <Paragraph position="10"> In the present implementation, SAM possesses three 'regular' scripts, for riding a bus, for riding a subway, and for going to a restaurant (3). These scripts have been simplified in various ways. For example, all of them assume that there is only a single main actor. The bus script has been restricted to a single track' for a long-distance bus ride, aHd the restaurant script does not have a '~c~onald's' or a '~e ~avillon track. This was done primarily to have a data base capable of handling specific stories of interest available in a reasonable time, secondarily to limit the storage needed (4). Nevertheless, as (3) The data base also contains script-like structures for 'weird ' or *unusual ' happenings like the main actor s becoming ill, or, as in Story 111, having his pocket picked. Such activities could be handled by a generalized inferencing program like the one described in [Rieger, 19751, the examples of Section 1.1 indicate, the current scripts are a re~sonable first pass at the dual problems of creating and managing this type of data structure,</Paragraph>
    </Section>
    <Section position="4" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
2.1 Goals,, Predictions and, Roles in Scr ip,ts
</SectionTitle>
      <Paragraph position="0"> Each situational script supplies a default y oal statement which is assumed, in the absence of input from higher level cognitive processes like 'planning ' [Schank and Abelson, 19751 , to be what a story referring to a script is about. The restaurant script for example, defines the INGEST and the resulting state-change in hunger as the central events of a story about eating in restaurants. Closely related to the goal statement is the sequence of mutual obligations that many scripts seem to entail. Invoking the bus script, for example, implies the contract between the rider and the bus management of a PTRANS to the desired location in return for the ATFWNS of the fare.</Paragraph>
      <Paragraph position="1"> Such obligations have a powerful influence on the predictions the system makes about new input. In the restaurant context, for example, an .input referring to an event beyond ordering or eating is not initially expected, because these events form the initial statement of obligation. Thus the system takes longer to identify a story sequence like: John went to a diner, He left a large tip.</Paragraph>
      <Paragraph position="2"> Once an input about ordering has been processed, SAM is prepared (4) The text for the restaurant script, presently the largest of the scripts, occupies roughly 100 blocks of PDP-10 disk storage, or about 64,000 ASCII characters.</Paragraph>
      <Paragraph position="3"> to hear about the preparation and serving of food, actions associated with eating, or paying the bill, but not about leaving the restaurant. This is because the main actor has not fulfilled the other half of the oblSgation.</Paragraph>
      <Paragraph position="4"> The binding of nominals in the story input to appropriate fillers in the script templates is accomplished in SAM by means of script variables with ass~ciated features. In the rather cxude system of features preseptly used, each script variable is assigned a superset menibership class: e. g., a hamburger is a 'food', whi$e a waiter is a 'humane- certain variables are also given roles: e. g., a hostess or a waiter can fill the 'maitre'd' ole. The former property would enable the system to distinguish between &amp;quot;The waiter brought Mary a hamburger&amp;quot; and &amp;quot;The waiter bropg-ht Mary the check&amp;quot;. The latter property identifies important roles in script contexts, primarily those to which it is possible to make definite reference without previous iqtrodnction, like 'the driver', 'the cook' or 'the check'. For stories in which certain script variables are not bound, the system provides a set of default bindings for the roles not mentioned : thus, SAM fills in 'meal' for a story in which the food ordered is not explicitly named. Variables without distinguished roles default to an indefinite filler, like 'someone' for the main actor.</Paragraph>
    </Section>
    <Section position="5" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
2.2 Script Structure
</SectionTitle>
      <Paragraph position="0"> Each SAM script is organized in a top-down manner as follows : into tracks, consisting of sceneg, which are in turn composed of subscenes. Each track of a script corresponds to a manifestation of the situation differing in minor features of the script roles, or in a different ordering of the scenes. So, for example, eating in an expensive restaurant and in ~c~onald's share recognizable seating, ordering , paying, etc., activities, but contrast in the price of the food, type of food served, number of restautant personnel, sequence of ordering and seating, and the like. Script scenes are organized aroond the main top-level acts, occurr ing in some definite sequence, that characterize a scriptal situation. The giving of presents, for example, would be a scene focus in a birthday party script, but putting on a party hat would not be. The latter would correspond to a subscene, perhaps within the 'prepar ing-to-celebrate ' scene of that script. In general, subscenes are organized around acts more or- less closely related to the main act of the scene, eitfier con'tributing a precondition for the main act, as walking to a table precedes sitting down; or resulting from the main act, as arriving at the desired location follows from the driver's act of driving the bus. An intuitive way of identifying scene foci and scene boundaries is to visualize a script network of interwoven paths. In such a network, the scene foci would correspond to points of maximum constrietion; scene boundaries to points of most constriction between foci. This essentially means that all paths thrpugh a scene go through the main act (except abort paths, discussed below) , and relatively few events are at scene edges.</Paragraph>
      <Paragraph position="1"> It is necessary, therefore, to distinguish certain events in a script: scripts, their tracks, scenes and subscenes all have f main*, 'initial' and 'final' events. For example, the main event of the 'ordering' event in a restaurant is the ordering act itself; an initial event is reading the menu; and a final event is the waiter telling the cook the order. Additionally, scripts and tracks have associated 'summaries', which refer to a script in general terms. Consider, for example, the following sentence from Story 111: &amp;quot;John went to New York by bus&amp;quot;. This sentence is marked in the underlying meaning representation by the SAM analyzer as a summary because of the presence of: ((ACTOR (*JOHN*) &lt;=&gt; (*SDO*) OBJECT ($BUS))) in the Instrument slot (5). Such sentences have two ccmmon functions in simple stories. They may indicate that a script was invoked and completed, and no further input should be expected for this instance of the script. This function of the summary of ten occurs with scripts (like those associated with .travelling) .</Paragraph>
      <Paragraph position="2"> which tend to be used as instruments* of other scripts (as in getting to a restaurant or store). Alternatively, they may signal that a wider range of possible next inputs is to be expected than would be predicted if the script were entered via an initial event. For example, the story sequence initiated with a summary: John took a train to New York. While leaving the train, he tipped the conductor.</Paragraph>
      <Paragraph position="3"> (5) The primitive ACT SDO is an extension of the primitive dummy CD ACT DO, and stands for an actor performing his script for a given situation, in this case the bus script ($BUS).</Paragraph>
      <Paragraph position="4"> sounds more natural than a sequence beginning with an initial event: John got on a train. while leaving the train, he tipped the conductor.</Paragraph>
      <Paragraph position="5"> These two functions of the summary contres,t widely in the range of predictions they invoke. However, additional inputs after a summary, as in the example above, often give the psychological Scenes are built up out of subscenes, which usually contain a single chunk of causal chain or 'path'. In SAM scripts, these paths are assigned a 'value* to indicate roughly their normality in the scrfptal context. Sever a1 pathvalues have been found useful in setting up the story representation. At one end of the normality range is 'default', which designates the path the sctipt applier takes through a scene when the input does not explicitly refer to it. For example, the input sequence: John went to Consiglio's. He ordered lasagna.</Paragraph>
      <Paragraph position="6"> makes no mention of ~ohn's sitting down, which would commonly be assumed in this situation. The system, following the default path, would fill in that John probably looked around inside the restaurant, saw an empty table, walked over to it, etc. Next on the normality scale is 'n~minal', designating paths which are usual in the actipt, not involving errors or obstructions in the normal flow of events. The sentences in Story I1 which refer to the hoetess are examples of nominal inputs. Finally, there are the 'interference/resolution' paths in a script. These are followed when an event occurs which blocks the normal functioning of the script. In a restaurant, for example, having to wait for a table is a, mild interference; its resolution occurs when one becomes available. More serious because it conflicts directly with the goal/obligation structure of the script is the main actor's discovery that he has no money to pay the bill. This is resolved in Story I11 by his doing dishes. An extreme example of an interference is the main actor's becoming irritated when a waiter fails to take his order, as in Story I, followed by his leaving the restaurant. When this happens, the script is said to have taken an 'abort' path.</Paragraph>
      <Paragraph position="7"> In addition to the above, certain incomplete paths, i. e- , paths having no direct consequences within the script, have been i,ncluded in the scriptal data base. The most important of these incomplete paths are the inferences from, and preconditions for, the events in the direct causal paths. Lumped under the pathvalue 'inference', these subsidiary events identify crucial resultative and enabling links which are useful in particular for question-answering [Lehnert, 19751. For example, the main path event '~ohn entered the train ' has attached the precondition that the train must have arrived at the platform, which in turn is given as a result of the driver's bringing the train to the station. Similarly, a result of the main path event '~ohn paid the bill' is that he has less money than previously. Both of these types of path amount to a selection among the vast number of inferences that could be made from the main path event by an inferencing mechanism like ~ieger's Conceptual Memory program [Rieger, 19751.</Paragraph>
      <Paragraph position="8"> A Special class of resul tative inferences' are those common events which are potentialized by main path events, though they may not occur in a given story. Labelled with the pathvalue 'parallel', these events may either occur often in a specific context without having important cohsequences, as in &amp;quot;The waiter filled ~ohn's water glass&amp;quot;; or they may happen in almost any context without contributing much to the story, as in the sentence &amp;quot;On the bus, John talked to an old lady&amp;quot;, from Story 111. Since such parallel paths often lead nowhere*, they are good candidates for being forgotten.</Paragraph>
    </Section>
    <Section position="6" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
3.0 The Script Applier
</SectionTitle>
      <Paragraph position="0"> Construction of a story representation from CD input supplied by the SAM analyzer is the job of the script applier (6). Under control of the SAM executive, the applier locates each new input in its collection of situational scripts, links it up with what has gone before, and makes predictions about what is likely to happen next. Since the SAM system as a whole is $ntended to model human understanding of simple, script-like stories, the script applier organizes its output into a form suitable for subsequent summary, paraphrase and question-answer ing activities.</Paragraph>
      <Paragraph position="1"> In the course of fitting a new input into the story (6) The current version of the applier is programmed in MLISP/LISP 1.6 and runs in an 85K core image on a PDP-10 computer. Processing of Story 111, the longest story attempted to date, took approximately 8 minutes with SAM as the single user of the timesharing system.</Paragraph>
      <Paragraph position="2"> representation, the applier performs several important subtasks. Identifying an input often requires an implicit job of reference specification. For example, in the sentence from Story I11 beginning &amp;quot;When the check came. . .&amp;quot; , there is surface ambiguity, reflected in the parser's outpuC, regarding donor and recipient. This ambiguity is settled in the restaurant context.by the assumption that the recipient is the main actor and that the donor is a member of the restaurant staff, preferably the waiter. An allied problem arises whem the applier, in placing a new conceptualization in the story representation, determines the relevant time relations. Certain types of time data are computed from the output conceptualization itself : for example, the relation between an MTRANS and its MOBJECT, which may determine whether 'remember ' or 'ask for' is appropriate in the final output. Other time relations are defined by the causal structure of the script itself: thus 'eating ' follows 'ordering ' .</Paragraph>
      <Paragraph position="3"> More complex time-order computations have to be made when the applier identifies two or more 'simple' conceptualizations in a compound input derived from sentences containing ambiguous words like 'during' or 'when'. Examples of this were encountered during the processing of Story 111, for example, in the sentence 'when he left the bus, he thanked the driver'. The system resolves this compound input into the plausible sequence of a PTRANS to the driver, the MTRANS of the 'thanking , and the PTRANS off the bus.</Paragraph>
    </Section>
    <Section position="7" start_page="33" end_page="33" type="sub_section">
      <SectionTitle>
3.1 Story ~epresentation
</SectionTitle>
      <Paragraph position="0"> The output of the script applier consists of linked story segments, one per script invoked, giving the particular script paths traversed by the input story. The backbone of the story representation is the eventlist of aJ1 the acts and state-changes that took place. The eventlist is doubly linked, causally and temporally, with the type of causation and time relatiohs filled in within a story segment by the applier .</Paragraph>
      <Paragraph position="1"> Attached to the eventlist are the appropr iate , instantiated preconditions, inferences and parallel events for each main path event, As discussed above, the inferences and preconditions have been selected for their expected utility in question-answer ing .</Paragraph>
      <Paragraph position="2"> Each story segment is identified by a label which gives access to important properties of the segment: what script it came from; what the particulars were of the script summary, maincon, entrycon, and exitcon this time through; and what interf erence/resolution cycles were encountered. Additionally, pointers are provided to extra-scriptal 'weird ' events that happened in the story. At the top, the global identifier STORY gives the gross structure of the story in terms of sequential, parallel and nested scripts and the weird things. This hierarchical organization facilitates summary and short paraphrase processing, while retaining the fine structure needed for extended paraphrasing and question-answer ing .</Paragraph>
      <Paragraph position="3"> Story I11 illustrates most of the present capabilities of the SAM script applier in story understanding. The applier accepts a CD representation of the nine sentences in turn from the analyzer and builds an eventlist consisting of 56 main path conceptualizations and 39 associated preconditions/inEerences.</Paragraph>
      <Paragraph position="4"> The 'parallel' events of John talking to the old lady and the bus driver also appear in the eventlist. The eventlist is divided into four story segments, one each for the bus, subway and restaurant scripts and one for the 'weird' robbery event. The identifier for the subway segment is marked as containing the weird event, as is the global STORY. The restaurant segment contains the interference/resolution pair 'unable to pay/wash dishes'. Additionally, the lack of money encountered during the paying scene was checked with the SAM executive during the processing of Story 111, since it violates one of the prime preconditions of the restaurant script. Since the executive found that the loss of money was a Consequence of the stealing event that oqcurred earlier, this event is not marked as weird.</Paragraph>
      <Paragraph position="5"> Appropriate summaries are provided for each story segment. At the top, STORY contains the information that the four segments are organized as a sequence of bus, subway and restaurant, with the pickpocket event nested inside the subway segment.</Paragraph>
    </Section>
  </Section>
  <Section position="19" start_page="33" end_page="68" type="metho">
    <SectionTitle>
4.0 Future Work
</SectionTitle>
    <Paragraph position="0"> As the examples show, SAM is capable of handling fairly complex stories in its present state of development. However, several extensions and additions to the scriptal data base and the script applier appear to be needed before SAM can achieve its ultimate potential.</Paragraph>
    <Paragraph position="1"> First, a more flexible method of pattern-matching is required so that the full diversity of input role-fillers can be accommodated. A method of comparing features of nominals in the parser output to the appropriate script variables is needed so that over- or underspecified inputs can be correctly identified. For example, the applier should be able to recognize the phrase 'the restaurant' as a partially specified instance of '~eone's' , found earlier.</Paragraph>
    <Paragraph position="2"> 4s an extension of this, input conceptualizations of a descriptive nature (e. g., &amp;quot;The restaurant was of red brick&amp;quot;) need to be processed in a way that allows the system to update .</Paragraph>
    <Paragraph position="3"> its image' of the role-fillers in a script. The facilities needed are similar to those provided by the 'occurrence set' in ~ieger's Conceptual Memory program [Rieger, 19751.</Paragraph>
    <Paragraph position="4"> The most important problem to be faced, however, is the generalization of the story representation to handle stories with beveral main actors, or with non-synchronous events. It is clear that the simple linear eventlist structure described in Section 3.1 would not be adequate for even such a simple story sequence as: &amp;quot;The cook made the lasagna, Meanwhile the wine steward poured the wine, *I</Paragraph>
    <Section position="1" start_page="33" end_page="68" type="sub_section">
      <SectionTitle>
4.1 Acknowledgement
</SectionTitle>
      <Paragraph position="0"> The programs discussed here are only a part of the SAM system, and a great deal of credit is due to my co-workers in the  Yale A1 Project: to Professors Roger Schank and Bob Abelson for the theory on which SAM is based and for their overall guidance; to Dr, Chris Riesbeck for valuable discussion and criticism, as well as a substantial part of the programming effort; and to</Paragraph>
    </Section>
  </Section>
  <Section position="20" start_page="68" end_page="68" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> A system has been designed to translate connected sequences of visual images of physical activities into conceptual descriptions. The representation of such activities is based on a canonical verb of motion so that the conceptual description will be compatible with semantic networks in natural language understanding systems. A case structure is described which is derived from the kinds of information obtainable in image data. A possible solution is presented to the problem of segmenting the temporal information st ream into linguistically and physically meaningful events. An example is given for a simple scenario, showing part of the derivation of the lowest level events. The results of applying certain condensatiom to these events show how details can be systematically eliminated to produce simpler, more general, and hence shorter, descriptions.</Paragraph>
    <Paragraph position="1"> This research was primarily supported by Canadian Defense Research Board grant 9820- 1 1, and partially by National Science Foundation grant If we view a motion picture such as illustrated in Figure 1, we are able to give a description of the physical activities in the scenario.</Paragraph>
    <Paragraph position="2"> This description is linguistic in the sense that the words used express our recognition of objects and movements as conceptual entities. A system for performing a sizeable part of this transformation of visual data into conceptual descriptions has been designed. It is described in Badler (1975); here we will present one small part of the system which is concerned with the organization of abstracted data from successive images of the scenario.</Paragraph>
    <Paragraph position="3"> We are interested in a possible solution to the following problem: Given that a conceptual description of a scenario is to be generated, how is it decided where one verb instance starts and another ends? In other words, we seek computational criteria which separate visual experience into discrete &amp;quot;chunks&amp;quot; or events. By organizing the representation of an event into a case structure for a canonical motion verb, events can be described in linguistic terms. Verbs of motion have been investigated directly or indirectly by Miller (1972). Hendrix et aL lt 7 3a, 197 3b). Martin (1973). and Schank (1973); semantic databases using variants of case structure verb representations Wllmore(1968)) include Winograd (197 Z), Rumelhart et a1 (197 2), and Simmons (197 3).</Paragraph>
    <Paragraph position="4"> We are concerned with physical movements of rigid or jointed objects so that motions may be restricted to translations and rotations.</Paragraph>
    <Section position="1" start_page="68" end_page="68" type="sub_section">
      <SectionTitle>
Objects may
</SectionTitle>
      <Paragraph position="0"> appear or disappear and the observer is free to move about.</Paragraph>
      <Paragraph position="1"> The resulting activities are combinations of the se where observer motions are factored out if at all possible. We assume that the scenarios contain recognizable objects exhibiting physically possible, and preferably natural, motions. A particular activity might consist of a single event, a sequence of events, sets of event sequences, or hierarchic organizations of events.</Paragraph>
      <Paragraph position="2"> The concept of &amp;quot;walking&amp;quot; is a good example of the last.</Paragraph>
      <Paragraph position="3"> Events are the basic building blocks of the conceptual description, and our events indicate the motion. of objects. The interpretation of motion in terms of causal relationships is generally</Paragraph>
    </Section>
    <Section position="2" start_page="68" end_page="68" type="sub_section">
      <SectionTitle>
Adverbials
Relationships
</SectionTitle>
      <Paragraph position="0"> be-tween the orientation and trajectory or axis of an object between the trajectory of an object and fixed world directions - null indicative of source and target between the path of an object and other (mving ) objects between an event and a previous</Paragraph>
    </Section>
  </Section>
  <Section position="21" start_page="68" end_page="74" type="metho">
    <SectionTitle>
UP-AND-DOWN BACK. THROUGH
</SectionTitle>
    <Paragraph position="0"> beyond the scope of the current system, although a semantic inference component could be included. Our descriptions consist mostly of observation of motion in context rather than explanation of why motion occurred.</Paragraph>
    <Paragraph position="1"> The general descriptive methodology is to keep only one static relational description of the scenario, that of the current image. Changes between it and the next sequential image are described by storing the names of changes in event nodes in a semantic network. In general, names of changes correspond to adverbs or prepositions (adverbials) describing directions or changing static relationships. Computational definitions for the set of adverbials in Table 1 appear in Badler (1975). We are only concerned with the senses of the adverbials pertaining to movement. Definitions arel implemented as demons: procedures which are activated, the executed, by the successive appearance of certain assertions in the image description or current conceptual database. These demons are related to those of Charniak (1972), although our use of them, their numbers, and their organization are simplified and restricted. They are used to recognize or classify properties or changes and to generate the hierarchic descriptive structure. An essential feature of this methodology is that the descriptions are continually condensed by this change abstraction process; descriptions grow in depth rather than length.</Paragraph>
    <Paragraph position="2"> The semantic information stored for each object in the scenario includes its TYPE, structural SUB-PARTS, VISIBILITY, MOBILITY, LOCATION ORIENTATION, and SIZE.</Paragraph>
    <Paragraph position="3"> Most of these properties are determined from the image sequence, but some are stored in object models (indexed by TYPE) in the semantic network, The event8 are also nodes in the semantic network. Each object is potentially the SUBJECT of an event node. A sequence of event nodes forms a history of movement of an object; only the latest node in the sequence is active, The set of active event nodes describes the current events in the scenario seen so far. The cases of the event node along with their approximate definitions follow.</Paragraph>
    <Paragraph position="4">  SUBJECT: An object which is exhibiting movement.</Paragraph>
    <Paragraph position="5"> AGENT: A motile object which contacts the SUBJECT.</Paragraph>
    <Paragraph position="6"> INSTRUMENT: A moving object which contacts the SUBJECT.</Paragraph>
    <Paragraph position="7"> REFERENCE: A pair of object features (on a fixed object) which are used to fix absolute directions independent of the observer's position. DIRECTION: A temporally-ordered list of adverbials and their associated  objects which apply to this SUBJECT.</Paragraph>
    <Paragraph position="8"> TRAJECTORY: The spatial direction of a location change of the SUBJECT. VELOCITY: The approximate magnitude of the velocity of the SUBJECT along the TRAJECTORY; it includes a RATES list containing STARTS, STOPS and (optionally) INCREASES or DECREASES.</Paragraph>
    <Paragraph position="9"> AXIS: The spatial direction of an axis of an orientation change (rotation) of the SUBJECT.</Paragraph>
    <Paragraph position="10"> ANGULAR-VELOCITY: Similar to VE MCITY, except for rotation about the AXIS.</Paragraph>
    <Paragraph position="11"> NEXT: The temporal successor event node having the same SUBJECT.</Paragraph>
    <Paragraph position="12"> STARTITIME: The time of the onset of the event.</Paragraph>
    <Paragraph position="13"> END-TIME: The time of the termination of the event.</Paragraph>
    <Paragraph position="14"> REPEAT-PATH: A list of event nodes which form a repeating sequence. These cases differ from Miller's (1972) primarily in the lack of a &amp;quot;permissive&amp;quot; case and our separation of the TRAJECTORY and AXIS cases.</Paragraph>
  </Section>
  <Section position="22" start_page="74" end_page="76" type="metho">
    <SectionTitle>
REFERENCE
</SectionTitle>
    <Paragraph position="0"> is new; one of its uses is to resolve descriptions of the same event from different viewpoints. The explicit times could be replaced by temporal relations. Miller's reflexive/objective distinction is not needed as each moving object has its own event nodes, regardless of the AGENT.</Paragraph>
    <Paragraph position="1"> A few necessary definitions follow before the presentation of the event generation algorithm.</Paragraph>
    <Paragraph position="2"> A.null event node has all its cases NIL or zero except START-TIME, END-TW, and perhaps NEXT.</Paragraph>
    <Paragraph position="3"> An event node is terminated when it has a non- NIL NEXT value.</Paragraph>
    <Paragraph position="4"> The function CREATE-EVENT-NODE (property pairs) creates an event node with the indicated case values, returning the node as a result. To compare successive values of numerical properties , a queue is associated with the case in current event nodes only. The front of the queue is represented by 'I*&amp;quot;: the place where new information is stored. The queues have length three; the three positions will be referenc ed by prefixing  the case name with either &amp;quot;NEW&amp;quot;, &amp;quot;CURRENT&amp;quot;, or A function SHIFT manipulates property queues when they retpire updating:</Paragraph>
    <Paragraph position="6"> The time will be abbreviated by TN and TL, For a particular event node E: TN: = IV3W-END-TIMII: (E); TC: = CURrnNT-END-TIME (E); Thus TN is always equal to the present image time. Now we can present the algorithm for the demon which controls the construction of the entlre event graph. It is executed once for each image when all lower level demons have finished; it creates, terminates, or updates each current event node.</Paragraph>
    <Paragraph position="7"> A. 1. Creating event nodes.</Paragraph>
    <Paragraph position="8"> A 1 1. An event node E is created when a mobile object first becomes visible and identifiable as an object.</Paragraph>
    <Paragraph position="10"> The NIL START-TIME has the interpretation that we do not know what was happening to this object prior to time TN.</Paragraph>
    <Paragraph position="11"> A. 1.2. An event node E is created when a jointed part of the parent  object with current event node EP is first observed to move relative to the parent, for example, an arm relative to a person's body.</Paragraph>
    <Paragraph position="13"> This is interpreted as the parent object moving the part using the joint as  the &amp;quot;instrament&amp;quot;. Any appfopriate attributes are placed in the NEW -property positions. The node E is then immediately terminated (A. 1.3). A. 1.3, An event node E2 is created whenever another event node El  is terminated.</Paragraph>
    <Paragraph position="15"> SUBJECT, AGENT, INSTRUMENT, REFERENCE, and DJRECTION are those which were present at termination of the previous node, subject to any additional conditions that changes in these may require.</Paragraph>
    <Paragraph position="16"> A. 2.</Paragraph>
    <Paragraph position="17"> Terminating event nodes. An event node E is terminated when  there are significant changes in its properties. All queue structures are deleted.</Paragraph>
  </Section>
  <Section position="23" start_page="76" end_page="76" type="metho">
    <SectionTitle>
RATES(ANGULAR-VELOCITY (E))).
</SectionTitle>
    <Paragraph position="0"> The DIRECTION list is unaltered except that the terminating adverbial (s) may be added to DIRECTION(E) rather than to DIRECTION(NEXT(E)) (see A. 2.1. Changes in SUBJECT. The assumptions of object rigidity and permanence preclude changes in an object.</Paragraph>
    <Paragraph position="1"> A. 2.21 3. Changes in AGENT and INSTRUmNT. These must be preceded by changes in CONTACT relations between objects and the SUBJECT. See A, 2.5 on DIRECTION.</Paragraph>
    <Paragraph position="2"> A. 2.4. Changes in REFERENCE. A change in the REFERENCE features forces termination of every event node referencing those features, as such changes are usually caused by spatial or temporal discontinuities in the scenario.</Paragraph>
    <Paragraph position="3"> A. 2.5. Changes in DWCTION.</Paragraph>
    <Paragraph position="4"> Changes in type (I) adverbials must be preceded by changes in TRAJECTORY, VELOCITY, AXIS, or ANGULAR-VELOCITY, because a relationship between an orientation and a TRAJECTORY or AXIS cannot change without at least one of the four cases changing. Changes in BACKWARD, FORWARD, and SIDEWAYS cause termination; this may occur with no orientation change if the TRAJECTORY has a non-zero derivative. For example, move a box in a circle while keeping its orientation constant.</Paragraph>
    <Paragraph position="5"> Changes in type (2) adverbials must be preceded by a change in TRAJECTORY, but some of these changes may be too slight to cause termination from the TRAJECTORY criteria. (A. 2.6. ). Changes from UP to DOWN or vice versa are the only ones in this group causing termination.</Paragraph>
    <Paragraph position="6"> Changes in type (3) adverbials terminate event nodes if and only if there is a change in a CONTACT relation or a VISIBILITY property, If the CONTACT is made or the VISIBILITY established, the adverbial goes into the new node's DIRECTION list. If the CONTACT is broken or VISIBILITY lost, the adverbial remains on the front of the terminated node's DIRECTION list.</Paragraph>
    <Paragraph position="7"> Since the type (4) adverbials are only indicators of current source and target, these do not change unless the path of the SUBJECT changes or the target object moves. Therefore no terminations arise from this group. The type (5) adverbials relate paths of the SUBJECT to other objects. They cause termination when they come into effect, and terminate their own nodes when they cease to describe the path.</Paragraph>
    <Paragraph position="8"> The tme (6) adverbials include higher level events and the basic repetitions. These all terminate the current event node. The repeated events (for example, BACK-AND -FORTH) are terminated when the repetition appears to cease.</Paragraph>
    <Paragraph position="9"> A. 2.6. Changes in TRAJECTORY. The changes in TRAJECTORY that are mas t important are those which change its derivative significantly. A change in the derivative from or to zero can be used (the start or end of a turn), but only the start is actually used for termination. Once the turn is begun, how it ends is unimportant since the final (current) trajectory is always saved.</Paragraph>
    <Paragraph position="10"> The other termination case watches for a momentarily large derivative which settles back to smaller values. This indicates a probable collision. It is of crucial importance in inferring CONTACT relations between objects when none were (or could be) directly observed.</Paragraph>
    <Paragraph position="11"> A. 2.7.</Paragraph>
    <Paragraph position="12"> Changes in VELOCITY. A change in VELOCITY from zero to a positive value (from a positive value to zero) terminates the current event node and enters STARTS (STOPS) in the new node's (old node's)</Paragraph>
  </Section>
  <Section position="24" start_page="76" end_page="76" type="metho">
    <SectionTitle>
VELOCITY RATES list,
</SectionTitle>
    <Paragraph position="0"> A. 2.8.</Paragraph>
    <Paragraph position="1"> Changes in AXIS. A reversal of rotation terminates the event node.</Paragraph>
    <Paragraph position="2"> This corresponds to a change in AXIS to the opposite direction, with no inte rrnediate values.</Paragraph>
    <Paragraph position="3"> A. 2.9. Changes in ANGULAR-VELOCITY, A change in ANGULAR-VELOCITY from zero to a positive value (from a positive value to zero) terminate the current event node and enters STARTS (STOPS) in the new node s (old node's) ANGULAR-VE LOCITY RATES list.</Paragraph>
    <Paragraph position="4"> A. 2.10.</Paragraph>
    <Paragraph position="5"> Changes in NEXT are not meaningful.</Paragraph>
    <Paragraph position="6"> A. 2.11112.</Paragraph>
    <Paragraph position="7"> Changes in START-TIME and END -TIME are not meaningful. A, 2.13.</Paragraph>
    <Paragraph position="8"> Changes in REPEAT-PATH. When new data fails to match the appropriate sub-event node of a REPEAT -PATH event node E, E is terminated. The definition of &amp;quot;match&amp;quot; for the basic repetitions appears in Badler (1975). The problem, in general, remains open. See, for example, Becker (1973).</Paragraph>
    <Paragraph position="9"> A.3, Maintaining event nodes. If the new assertions do not cause termination of the event node, the property queues are merely shifted:</Paragraph>
    <Paragraph position="11"> END-TIME(E): = SHIFT(END-TIME(E)).</Paragraph>
    <Paragraph position="12"> What does an event mean? This algorithm motivates a theorem that the events generated are the finest meaningful partition of the movements in the image sequence into distinct activities. The hypothesis of the assertion ie the natural environment being observed and the linguistically-based conceptual description desired, The conclusion is that an event node produced from this algorithm describes either the lack of motion or else an unimpeded, simple linear or smoothly curving (or rotating) motion of the SUBJECT with no CONTACT changes. In addition, the orientation of the SUBJECT does not change much with respect to the trajectory. The proof of this assertion follows directly from the choice of termination conditions.</Paragraph>
    <Paragraph position="13"> We will apply this algorithm to data obtained from each of the images in Figure 1. The lower front edge of the house is arbitrarily chosen as the REFERENCE feature; NORTH is toward the right of each image. We will not discuss the computation of the static relations from each image, only list in Table 2 the changes in the static description from irnage-toimage. Trajectory and rotation data are omitted for simplicity, although changes of significance are indicated.</Paragraph>
    <Paragraph position="14"> If we &amp;quot;write out&amp;quot; the event node sequence using the canonical motion verbs MOVES and TURNS with the adverbial phrases from the RATES and DIRECTION lists, we obtain the following lengthy, but accurate.</Paragraph>
    <Paragraph position="15"> description: C. 1 There is a CAR, C. 2 The CAR STARTS MOVING TOWARD the OBSERVER and EASTWARD, then ONTO the ROAD.</Paragraph>
    <Paragraph position="16"> C. 3 The CAR, while GOING FORWARD, STARTS TURNING, MOVES TOWARD the OBSERVER and EASTWARD, then NORTHWARD-AND-EASTWARD, then FROM the DRIVEWAY and OUT -OF the DRWEWAY, then OFF-OF the DRIVEWAY,  observer's front.</Paragraph>
    <Paragraph position="17"> Termination of Ci creates Ci+l by A.1.3.</Paragraph>
    <Paragraph position="18"> C. 4 The GAR, while GOING FORWARD, MOVES N0RTHW.AR.D-AND-EASTWARD, then NORTHWARD, then AROUND the HOUSE and</Paragraph>
  </Section>
  <Section position="25" start_page="76" end_page="76" type="metho">
    <SectionTitle>
AWAY-FROM the DRIVEWAY, then AWAY -FROM the HOUSE and
S'I'OPS TURNING,
</SectionTitle>
    <Paragraph position="0"> C. 5 The CAR, while GOING FORWARD, MOVES NORTHWARD, then AWAY.</Paragraph>
    <Paragraph position="1"> The canonical form follows easily from the case representation and the DIRECTION list orderings. The directional adverbials FORWARD, BACKWARD and SIDEWAYS are interpreted as lasting the duration of the event, hence are written as &amp;quot;while GOING.. . &amp;quot; clauses. STARTS is always interpreted at the beginning of the sentence, STOPS at the end. The termination conditions assure its correctness, There is much redundancy in this description, but it is only the lowest level, after all, and many activities span several events. Two sets of condensations are applied by demons that watch over terminated event nodes. The first set is mostly concerned with interpreting certain null events caused by the image sampling rate and removing trajectory changes which prove to be insignificant. The second set of demons removes adverbials referring to directions in the support plane, removes RATES terms except STOPS, and generalizes redundant adverbials referring to the same object. The result of applying these condensations is: C.2 The CAR MOVES TOWARD the OBSERVER, then ONTO the ROAD.</Paragraph>
    <Paragraph position="2"> C. 3 The CAR, while GOING FORWARD, MOVES TOWARD the OBSERVER, then FROM the DRIVEWAY.</Paragraph>
    <Paragraph position="3"> C.4 The CAR, while GOING FORWARD, MOVES AROUND the HOUSE and AWAY-FROM the DRIVEWAY, then AWAY-FROM the HOUSE, then STOPS TURNING.</Paragraph>
    <Paragraph position="4"> C. 5 The CAR, while GOING FORWARD, MOVES AWAY.</Paragraph>
    <Paragraph position="5"> Another condensation can be applied for the sake of less redundant output. It does not, however, permanently affect the database: The CAR MOVES TOWARD the OBSERVER, then ONTO the ROAD, while GOING FORWARD, then FROM the DRIVEWAY, then AROUND the HOUSE, then AWAY-FROM the HOUSE, then STOPS TURNING, then MOVES AWAY.</Paragraph>
    <Paragraph position="6"> Note that FROM the DRIVEWAY follows ONTQ the ROAD. This is due to the pictorial configuration: the car is on the road before it leaves the driveway. The position of the &amp;quot;while GOING FORWARD&amp;quot; phrase could be shifted backwards in time to the beginning of the translatory motion, but this may be risky in general. We will leave it where it is, since this is primarily a higher level linguistic matter.</Paragraph>
    <Paragraph position="7"> By applying demons which recognize instances of specific motion verbs to the individual event nodes, then condensing as above, we get: The CAR APPROACHES, then MOVES ONTO the ROAD, then LEAVES the DRIVEWAY, than TURNS AROUND the HOUSE, then DRIVES AWAY -FROM the HOUSE, then STOPS TURNING, then DRIVES AWAY.</Paragraph>
    <Paragraph position="8"> The major awkwardness with this last description is that it relates the car to every other object in the scene. Normally one object or another would be the focus of attention and statements would be made regarding its role. Such manipulations of the descriptions are yet unclear.</Paragraph>
    <Paragraph position="9"> In conclusion, we have outlined a small part of a system designed to translate sequences of images into linguistic semantic structures. Space permitted us only one example, but the method also yields descriptions for scenarios containing observer movement and jointed objects (such as walking persons). The availability of low level data has significantly shaped the definitions of the adverbials and motion verbs. Further work on these definitions, especially motion verbs, is anticipated. We expect that the integration of vision and language systems will benefit both domains by sharing in the specification of representational stmctures and description processes.</Paragraph>
  </Section>
  <Section position="26" start_page="76" end_page="76" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> This paper is a justification for the use of frame analysis as a linguistic theory of American Sign Language. We give examples to illustrate how frame analysis captures many of the important features of ASL.</Paragraph>
    <Paragraph position="1"> 0. l ntroduct ion From a linguistic standpoint, we are interested in language processing systems for the elainis that they make about language in general. Our- interests in those clairris leads us to exanline what inipl icaf ions they may have for the analysis of languages other than English. The data from American Sign Language (ASL) is important because it is indicative of the way people perceive and represent events. This linguistic data requires careful analysis and much psychological insight before it can be used as evidence for any particular theory of representation of visual knowledge of events. We have tried to bring together some ideas from artificial intelligence, linguistics, and psycholinguistics in order to analyze the data from ASL.</Paragraph>
    <Paragraph position="2"> The major framework we have adopted from At is that of frames. Minsky's introduction of frames as a way of representing knowledge and the further formulations of frames and related notions by Winograd and Fillmore form the bases for our frame analysis. We rely heavily on the work done by psycholinguists on visual perception as a justification for using frame analysis. Further just if icat ion comes as a resul t of the work of l inguists and psycholinguists on ASL and the visual perception of the deaf.</Paragraph>
    <Paragraph position="3"> The two most direct sources for our analysis of ASL are Reid (1974) and Thompson (1975). Reid's paper presents a clear and useful distinct ion between the linguistic level of the sentence and the conceptual level of the image.</Paragraph>
    <Paragraph position="4"> The sentence is a generalization and the image is an instantiation of that geheralization. However, &amp;quot;the units in a sentence are not just realized as 'parts' of a whole represented in the image by the individual participants, rather these units act reciprocally to determine jointly the character of the related participants and to unite them into a system of dependencies.&amp;quot; At the level of the sentence the verb is all-important because it governs the relations that exist between the nouns. However, it has no direct representation in the image; it is merely embodied in the structure of the image. Thompson's paper gives guidelines for using frames in linguistic analysis. His definitions of key concepts and his examples of frames for English have been a model for our analysis.</Paragraph>
  </Section>
  <Section position="27" start_page="76" end_page="76" type="metho">
    <SectionTitle>
1. American S ign Language
</SectionTitle>
    <Paragraph position="0"> ASL is *e language of many deaf people in the US. There is a continuum encompass4ng the many version of several sign, systems. ASL is a manual language composed of signs, fingerspelling, and occasional initialization of signs. It is in no way a signed version of English but is rather an independent language as different from English as is French or Japanese.</Paragraph>
    <Paragraph position="1"> ASL is a visual language. This visual modality allows it not only a temporal but also a multidimensional spatial framework as well as freedom from many of the constraints nermally put on a linear language. Many 'spatial relations can be preserved in minllture in what has been referred to in the sign literature as a visual analog. For example, he sentence, 'Fred stood in front of Harry,' does not necessitate a linear description, It can be represented by the indexicalized marker for FRED being positioned in the signlng space in front of the one for HARRY. It is with respect to the specification of location and the use of deictic elements that sign most clearly distinguishes itself from spoken languages. This and other related problems in sign will be examined later in this paper. Focusing on the aspects of visual analog and deixis does not imply that sign does not employ many of the linear and temporal devices used in spoken languages, but rather that these devices serve different functions.</Paragraph>
    <Paragraph position="2"> ASL is linearly ordered with respect to a standard method for presenting a scenario. The order of presentation is usually ground, then figures, then the action or relation involved. A room would be specified, then a door, then relevant furniture, then participants in an action. Generally,signs are presented in such a way as to allow further reference to them even if this referencing was nat Intended when the element was introduced into the discourse. null A relational grammar (~erlmutter and postal) can be useful in describing ASL. Their grammar focuses on the relations of various participants in an actioh to the verb. The notion of subject can be related to what Friedman calls the Agent (AGENT-PATI ENT) or what Reid cal l s the causer (CAUSER-AFFECTED ELEMENT-RANGE) . The Agent or causer shows up in sign as the active participant, the pa:ient as the usually stationary participant being acted upon. As in relational grammar, these relations are based upon observational properties of the terms with respect to the verb. The relati~nal model is attractive because it does not force one to specify the syntactic form of the sentence through a rigid ordering or tree structure.</Paragraph>
    <Paragraph position="3"> Even more flexible is a frame analysis model which allows one to speak in terms of a scene or visual image. Proximal relations can then be preserved without translation into any linear forms. The frames approach emphasizes an important aspect so often repeated in descriptions of ASL. What one is doing is building a picture -- a scene. The signer is always thinking in terms of the picture he is presenting. He is trying to produce a miniat~lre characterization of a real event. When elements of the event are present and within access for him to refer to in his discourse, he will use them. For example, he will point to an actual person rather than producing an arbitrary grammatical index to refer to that person. Describing sign language through frames allows-one to stress the visual picture being presented. It allows also for the smooth integration of other communication conventions used within the speech act. For exarnpte, if mime is found to be more explicit than the use of conventionalized ASL forms, it can easily be incorporated into the discourse making the total presentation a more direct representation of the event.</Paragraph>
  </Section>
  <Section position="28" start_page="76" end_page="76" type="metho">
    <SectionTitle>
2. Visual Logic
</SectionTitle>
    <Paragraph position="0"> Boyes (1972) gives various arguments based on visual perception experiments for analyzing sign in terns of visual logic. By 'visual logic,' she means a system of rules simi lar to the rules people use to make sense of any visual experience. In the next section we show that frame analysis can be considered an appropriate visual logic for sign language. First we would like to present the basic arguments from Boyes (1972) for using visual logic since these arguments also support the use of frarne analysis.</Paragraph>
    <Paragraph position="1"> There are three major resul ts of visual percept ion experimentation whi ch Boyes cites in order to begin a study of the constraints that the visual mode puts on a sign language. These results all show the limitations of visual memory as compared to aud i tory menwry. These memory processes can each be divided into the same three Stages. First, there is the in'itial storage of the stimulus which is identical to the actual stimulus. This part of memory is referred to as iconic memory (visual mode) or echoic memory (auditory mode) . The next stage is short term memory where rehearsal can take place. Rehearsal is the process of repetition of the stored material during which the material is decoded, i.e., grouped into meaningful segments. This recoded material is then stored in long term memory.</Paragraph>
    <Paragraph position="2"> One result that Boyes cites is that iconic memory is shorter than echoic memory. Iconic storage usually lasts for between 250 msec and 1 sec whereas echoic storage can last as long as 10 sec. A second fact is that the reaction time to visual stimuli is longer than that to auditory stimuli. The third result is that visual short term memory is more limited than auditory short term memory in that it does not seem to be able to hold as many items in the presence of continued input. The current figures for this are 4 or 5 items maximum in visual STM as opposed to j - + 2 items in auditory STM. Boyes claims that this difference is due to the limited capacity for rehearsal of visual information.</Paragraph>
    <Paragraph position="3"> All three of these results show that there is generally less time available for processing the sign sentence then there is for the spoken sentence. The temporal segmentation of sign would have to produce segments short enough to fit in iconic memory. And the sentence would have to be structured in such a way as to not tax STM with its limited rehearsal capacity. The sentence structure cannot rely on dependencies of elements which are temporally separated beyond the span of visual STM. Boyes seems to go a bit too far here and says that there should not be a &amp;quot;syntax which depends on decoding a tcmporal succession of images as a unit.&amp;quot; But all this really means is that the sentences in ASL must be shorter th&amp;-r 5 items or that they must be processed in a way that does not require linguistic links between items which are separated by more than 4 items. Of course, more must be known about the linguistic processing of sign language before these conclusions can be made more specific.</Paragraph>
    <Paragraph position="4"> In any case, it is clear that more information must be encoded per time interval in a visual language than in a spoken language, if we assume that the rate of transmission of information is to be the same in both. This can be accomplished by the mode of production in two ways. First, the symbol system used must be more direct, i.e., there should be a simpler mapping be-tween visual sign and meaning than there is between sound and meaning. Secondly, sign must utilize its spatial dimensions to overcome the temporal limitations on the transmission of information.</Paragraph>
    <Paragraph position="5"> Frame analysis. is able to represent these qualities of ASL.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML