File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-2069_metho.xml

Size: 6,529 bytes

Last Modified: 2025-10-06 14:11:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="C82-2069">
  <Title>TOPIC IDENTIFICATION TECHNIQUES YOR PREDICTIVE LANGUAGE ANALYSERS /</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
TOPIC IDENTIFICATION TECHNIQUES YOR PREDICTIVE LANGUAGE
ANALYSERS /
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 f Introd.u,ctiQn
</SectionTitle>
    <Paragraph position="0"> The use of prediction as the basis for inferential analysis mechanisms for natural language has become increasingly popular in recent years. Examples of systems which use prediction are FRUMP (DeJong 79) and(Schank 75a). The property of interest here is that their basic mode of worki~ is to determine whether an input text follows one of the systems p~s-specified patterns; in other words they predict, to some extent, the form their input texts will take. A crucial problem for such systems is the selection of suitable sets of predictions, or patterns, to be applied to any particula~ text, and it is this problem 1 want to address in the paper.</Paragraph>
    <Paragraph position="1"> I will assume that the predictions are organised into bundles acoordi~ to the topis of the texts to which they apply. This is a generalisation of the script idea employed b~ (DeJong 79) and (Schank75a). l will call such bundles s~ereotyDes.</Paragraph>
    <Paragraph position="2"> The basis of the technique described here is a distinction between the process of su~estin~ possible topics of a section of text and the process of eliminatin~ candidate topics (and associated predictions) which are not, in fact, appropriate for the text section. Those candidates which are not eliminated are then identified as the topics of the text - 281 section. (There may only be one such candidate.) This approach allows the use of algorithms for suggesting possible topics which try to ensure that if the system possesses a suitable stereotype for a text section it is activated, even at the expense of activating larEe numbers of irrelevant stereotypes. This technique has been tested in a computer system  called Scrabble.</Paragraph>
    <Paragraph position="3"> 2! Su~estin~ Candidate Topics  The discovery of candidate topics for a text secant is driven by the association of a set of patterns of sen~ntio primitives with each stereotype. (For the purposes of this paper it is assumed that the system has access to a lexicon containing entries whose semantic component 18 something like that used by (Wilks 77).) As a word is input to the system the senses of the word are examined to determine if any of them have a semantic description which contains a pattern associated with any of the system s stereotypes. If any do contain such a pattern the corresponding stereotypes axe loaded into the active workepace of the syste m, unless they are already active.</Paragraph>
    <Paragraph position="4">  In parallel with the suggestion process, the prediotionm of each stereotype in the active workspace are compared with the text. In Scrabble, the sentences of the text are first parsed into a variant of Conceptual Dependency (CD) representation (Schank 75b) by a program described in (Cater 80). The semantic representation scheme lxas been extended to include nominal descriptions similar in power to those used by (Wilke 77). The~predictions are compared with the CD representation structures at the end of each sentence! but nothAng in the scheme described in this paper could not be applied to a - 28;_) system whloh inteKrated the process of parefut with that of determining whether or not a fragment of the text satisfies some prediction, as is done in (DeJon8 79).</Paragraph>
    <Paragraph position="5"> It is likely that stereotypes which are not relevant to the toplo of the ourz~nt text 8eKment will hats been loaded sm a result of the magKestion procesS', Since the cost of the comparison of .a prediction with the CD-representatton of a sentence of the text t8 not trivial It is impoz~ont that irrelevant stereotypes are removed from the active workepsoe as rapidly as possible, The pztmax7 algorithm used by Scrabble removes any stereotype which has faAled to predict mOre of the p~opositiong in lnoomlng the text than it has successfully predicted, Thls slmple algorttha has proved adequate in tests and its simplicity also ensures that the cost of reuovtn6 irrelevant stereotpyes is mlnlmlsed, Further processing Is subsequently done to separate stereotypes whloh were never appropriate for the text from stereotypes whloh were useful for the analysis of some part of the text, but are no lonKer useful.</Paragraph>
    <Paragraph position="6"> 4, Jbl EXample Consider the ~ollowAng short text, adapted from (CharnAak 78), Jaok ptoked a oem of tuna elf the shelf, He put it in hie basket. He psAd for it and went home.</Paragraph>
    <Paragraph position="7"> Assume that associated with the primitive pattern for food the system has stereotypes for eattnK in a rester, shopping at a supermarket, and prepart~ a ms8~ In the kitchen, The Xextoon en.tz7 for tuna (a large sea fleh whloh 18 Qaught for food) wall C/ontaAn this pattern, and this wall oause the loadlng of the above three stereotypes into the active workspaoe. The restaurant stereotype will not predict the first sentence, and so will ~-medtately be unloaded. Both the supermarket and kitchen stereotypes expect sentences llke - 283 the first in the text. When the scold sentence i8 read, the supermarket stereotype will be q~xpeoting it (since it expects pul~ohases to be put into basketl~), but the kitchen stereotype wall not. However the kitchen stereotype will not be unloaded since, 8o far, it has predicted 88 many propositions as it has failed to predict. When the third sentence is read, again the supermarket stereotype has predicted propositions of this form, but the kitchen stereotype has not. Therefore the kitchen stereotype is removed from the active workspace, and the topic of text is firmly identified a8 a visit to the supermarket. null It shogld benoted that a completely realistic system would have to perform much more complex processing to analyse the above example. In such a system additional stereotypes would probably be activated by the occurrence of the primitive pattern for food, and it is likely that yet more stereotypes would be activated by different primitive patterns in the lexicon entries for the words in the input text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML