File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-3002_metho.xml
Size: 20,554 bytes
Last Modified: 2025-10-06 14:10:05
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-3002"> <Title>What Humour Tells Us About Discourse Theories</Title> <Section position="3" start_page="31" end_page="33" type="metho"> <SectionTitle> 2 Models of Discourse </SectionTitle> <Paragraph position="0"> Formalsemantics(Montague, 1973)lookedatlogical structures, but it became evident that language builds up on what is seemingly semantic incompatibility, particularly in Gricean Implicature (Grice, 1981). It became necessary to look at the relations that describe interactions between such structures. (Hobbs, 1985) introduces an early theory of discourse and the notion of coherence relations, which are applied recursively on discourse segments. Coherence relations, such as Elaboration, Explanation and Contrast, are relations between discourse units that bind segments of text into one global structure. (Grosz and Sidner, 1986) incorporates two more important notions into its model - the idea of intention and focus. The Rhetorical Structure Theory, introduced in (Mann and Thompson, 1987), binds text spans with rhetorical relations, which are discourse connectives similar to coherence relations.</Paragraph> <Paragraph position="1"> The Discourse Representation Theory (DRT) (Kamp, 1984) computes inter-sentential anaphora and attempts to maintain text cohesion through sets of predicates, termed Discourse Representation Structures (DRSs), that represent discourse contained in the text, and forms the basis for resolving anaphora and discourse referents.</Paragraph> <Paragraph position="2"> By marrying DRT to a rich set of rhetorical relations, Segmented Discourse Representation Theory (SDRT) (Lascarides and Asher, 2001) attempts to to create a dynamic framework that tries to bridge the semantic-pragmatic interface.</Paragraph> <Paragraph position="3"> It consists of three components - Underspecified Logical Formulae (ULF), Rhetorical Relations and Glue Logic. Semantic representation in the ULF acts as an interface to other levels.</Paragraph> <Paragraph position="4"> Information in discourse units is represented by a modified version of DRS, called Segmented Discourse Representation Structures (SDRSs).</Paragraph> <Paragraph position="5"> SDRSs are connected through rhetorical relations, which posit relationships on SDRSs to bind them.</Paragraph> <Paragraph position="6"> To illustrate, consider the discourse in (3): (3) Who supports Gorbachev? No one does, he can still walk by himself! The rhetorical relations over the discourse are shown in Figure 3. Here, Explanation induces subordination and implies that the content of the subordinate SDRSs work on further qualifying the principal SDRS, while Question-Answer Pair induces coordination. Rhetorical relations thus connect semantic units together to formalize the flow in a discourse. SDRT's Glue Logic then runs sequentially on the ULF and rhetorical relations to reduce underspecification and disambiguation and derive inferences through the discourse. The way inferencing is done is similar to DRT, with the additional constraints that rhetorical relations specify. null A point to note is SDRT's Maximum Discourse Coherence (MDC) Principle. This principle is used to resolve ambiguity in interpretation by maximizing discourse coherence to obtain the Pragmatically Preferred interpretation. There are three conditions on which MDC works: (a) The more rhetorical relations there are between two units, the more coherent the discourse. (b) The more anaphorae that are resolved, the more coherent the discourse. (c) Some rhetorical relations can be measured for coherence as well. For example, the coherence of Contrast depends on how dissimilar its connected prepositions are. SDRT uses rhetorical relations and MDC to resolve lexical and semantic ambiguities. For example, in the utterance 'John bought an apartment. But he rented it', the sense of rented is that of renting out, andthatisresolvedinSDRTbecausetheword but cues the relation Contrast, which prefers an interpretation that maximizes semantic contrast between its connectives.</Paragraph> <Paragraph position="7"> Glue logic works by iteratively extracting sub-sets of inferences through the flow of the discourse. This is discussed in more detail later.</Paragraph> <Section position="1" start_page="32" end_page="33" type="sub_section"> <SectionTitle> 2.1 Lexicons for Discourse modeling </SectionTitle> <Paragraph position="0"> Pustejovsky's Generative Lexicon (GL) model (Pustejovksy, 1995) outlines an ambitious attempt to formulate a lexical semantics framework that can handle the unboundedness of linguistic expressions by providing a rich semantic structure, a principled ontology of concepts (called qualia), and a set of generative devices in which participants in a phrase or sentence can influence each other's semantic properties.</Paragraph> <Paragraph position="1"> The ontology of concepts in GL is hierarchical, and concepts that exhibit similar behaviour are grouped together into subsystems called Lexical Conceptual Paradigms (LCP). As an example, the GL structure for door is an LCP that represents both the use of door as a physical object such as in 'he knocked on the door', as well as an aperture like in 'he entered the door'.</Paragraph> <Paragraph position="2"> In this work, we extend the GL structures to incorporate likelihood measures in the ontology and the event structure relations. The Probabilistic Qualia Structure, which outlines the ontological hierarchy of a lexical item, also encodes frequency information. Every time the target word appears together with an ontologically connected concept, the corresponding qualia features are strengthened. This results in a probabilistic model of qualia features, which can in principle determine that a book has read as its maximally likely telic role, but that in the context of the agent being the author, write becomes more likely.</Paragraph> <Paragraph position="3"> Generative mechanisms work on this semantic structure to capture systematic polysemy in terms of type shifts. Thus Type Coercion enforces semanticconstraintsontheargumentsofapredicate. null For example, 'He enjoyed the book' is coerced to 'Heenjoyedreadingthebook'sinceenjoyrequires an activity, which is taken as the telic role of the argument, i.e. that of book. Co-composition constrains the type-shifting of the predicate by its arguments. An example is the difference between 'bake a cake' (creating a new object) versus 'bake beans' (state change). Finally, Selective Binding type-shifts a modifier based on the head. For example, in 'old man' and 'old book', the property being modified by old is shifted from physical-age to information-recency.</Paragraph> <Paragraph position="4"> To accommodate for likelihoods in generative mechanisms, we need to incorporate conditional probabilities between the lexical and ontological entries that the mechanisms work on. These probabilities can be stored within the lexicon itself or integrated into the generative mechanisms. In either case, mechanisms like Type Coercion should no longer exhibit a default behaviour - the coercion must change based on frequency of occurrence and context.</Paragraph> </Section> </Section> <Section position="4" start_page="33" end_page="34" type="metho"> <SectionTitle> 3 The Analysis of Humour </SectionTitle> <Paragraph position="0"> The General Theory of Verbal Humour (GTVH), introduced earlier, is a well-known computational model of humour. It uses the notion of scripts to account for the opposition in jokes. It models humour as two opposing and overlapping scripts put together in a discourse, one of which is apparent and the other hidden from the reader till a trigger point, when the hidden script suddenly surfaces, generating humour. However, the notion of scripts implies that there is a script for every occasion, which severely limits the theory. On the other hand, models of discourse are more general and do not require scripts. However, they lack the mechanism needed to capture such oppositions.</Paragraph> <Paragraph position="1"> In addition to joke (3), consider: (4) Two guys walked into a bar. The third one ducked.</Paragraph> <Paragraph position="2"> The humour in joke (4) results from the polysemous use of the word bar. The first sentence leads us to believe that bar is a place where one drinks, but the second sentence forces us to revise our interpretation to mean a solid object. GTVH would use the DRINKING BAR script before the trigger and the COLLISION script after. Joke (3), quoted in Raskin's work as well, contains an obvious opposition. The first sentence invokes the sense of support being that of political support. The second sentence introduces the opposition, and the meaning of support is changed to that of physical support. null In all examples discussed so far, the key observations are that (i) a single inference is primed by the reader, (ii) this primary inference suppresses other inferences until (iii) a trigger point is reached.</Paragraph> <Paragraph position="3"> To formalize the unfolding of a joke, we refer back to Figure 1. Let t be a point along the timeline. When t < TP, both P1 and P2 are compatible, and the possible world is P = P1 [?] P2. P1 is the preferred interpretation and P2 is hidden. When t = TP, J2 is introduced, and P1 becomes incompatible with P2, and P1 may also lose compatibility with J2. P2 now surfaces as the preferred inference. The reader has to invoke a search to find P2, which is represented by the search gap.</Paragraph> <Paragraph position="4"> A possible world Pi = {qi1,qi2,...,qik} where qmn is an inference. Two worlds Pi and Pj are incompatible if there exists any pair of sets of inferences whose intersection is a contradiction. i.e.</Paragraph> <Paragraph position="5"> Pi is said to be incompatible with</Paragraph> <Paragraph position="7"> They are said to be compatible if no such subsets exist.</Paragraph> <Paragraph position="8"> We now explore in detail why compositional discourse models fail to handle the mechanisms of humour.</Paragraph> <Section position="1" start_page="33" end_page="34" type="sub_section"> <SectionTitle> 3.1 Beyond Scripts - Why Verbal Humour </SectionTitle> <Paragraph position="0"> Should Be Winner Take All An argument against the approach of existing discourse models like SDRT concerns their iterative inferencing. At each point in the process of infer- null encing, SDRT's Glue Logic carries over all interpretations possible within its constraints as a set. MDC ranks contending inferences, allowing less preferred inferences to be discarded, and the result of this process is a subset of the input to it. Contrasting inferences can coexist through underspecification, and the contrast is resolved when one of them loses compatibility. This is cognitively unlikely; (Miller, 1956) has shown that the human brain actively retains only around seven units of information. With such a limited working memory, it is not cognitively feasible to model discourse analysis in this manner. Cognitive models working with limited-capacity short-term memory like in (Lewis, 1996) support the same intuition.</Paragraph> <Paragraph position="1"> Thus, a better approach would be a Winner Take All (WTA) approach, where the most likely interpretation, called the winner, suppresses all other interpretations as we move through the discourse.</Paragraph> <Paragraph position="2"> The model must be revised to reflect new contexts if they are incompatible with the existing model.</Paragraph> <Paragraph position="3"> Let us now explore this with respect to joke (3). There is a Question-Answer relation between the first sentence and the next two. The semantic representation for the first sentence alone is: [?]x(support(x,Gorbachev)),x =? The x =? indicates a missing referent for who. Using GL, it is not difficult to resolve the sense of support to mean that of political support.</Paragraph> <Paragraph position="4"> To elaborate, the lexical entry of Gorbachev is an LCP of two senses - that of the head of government and that of an animate, as shown:</Paragraph> <Paragraph position="6"> The two senses of support applicable in this context are that of physical support and of political support. We use abstract support as a generalization of the political sense. The analysis of the first sentence alone would allow for both these possibilities: null</Paragraph> <Paragraph position="8"> Thus, after the first sentence, the sense of support includes both senses, i.e. support [?] {supportabs,supportphy}.</Paragraph> <Paragraph position="9"> We then come across the second sentence and establish the semantic representation for it, as well as establish rhetorical relations. We find that the sentence contains walk(z). SDRT's Right Frontier Rule resolves the referent he to Gorbachev. Also, the clause 'no one does' resolves the referent x to null. Thus, we get:</Paragraph> <Paragraph position="11"> The action walk requires an animate argument.</Paragraph> <Paragraph position="12"> Since walk(Gorbachev) is true, the sense of support in the previous sentence is restricted to mean physical support, i.e. support = supportphy, since only supportphy can take an animate argument as its object - the abstract entity requirement of supportabs causes it to be ruled out, ending at a final inference.</Paragraph> <Paragraph position="13"> The change of sense for support is key to the generation of humour, but SDRT fails to recognize the shift since it neither has any priming mechanism nor revision of models built into it.</Paragraph> <Paragraph position="14"> It merely works by restricting the possible inferences as more information becomes available. Referring to Figure 1 again, SDRT will only account for the refinement of possible worlds from P1[?]P2 to P2. It will not be able to account for the priming of either Pi, which is required.</Paragraph> </Section> </Section> <Section position="5" start_page="34" end_page="36" type="metho"> <SectionTitle> 4 A Probabilistic Semantic Lexicon </SectionTitle> <Paragraph position="0"> We now introduce a WTA model under which priming could be well accounted for. We would like a model under which a single interpretation is made at each point in the analysis. We want a set of possible worlds P such that:</Paragraph> <Paragraph position="2"> WTA ensures that only the prime world P is chosen by J1. When J2 is analyzed, no world p [?] P can satisfy J2, i.e: [?]p [?] P,!J2 [?]- p In this case, we need to backtrack and find another set Pprime that satisfies both J1 and J2, i.e: (J1,J2) [?]-WTA Pprime In Figure 1, P = P1 and Pprime = P2.</Paragraph> <Paragraph position="3"> The most appropriate way to achieve this is to include the priming in the lexicon itself. We present a lexical structure where senses of compositional units are attributed with a probability of occurrence approximated by its frequency count. The probability of a composition can then be calculated from the individual probabilities. The highest probability is primed. Thus, at every point in the discourse, only one inference emerges as primary and suppresses all other inferences. As an example, the proposed structure for Gorbachev is presented below:</Paragraph> <Paragraph position="5"> ...</Paragraph> <Paragraph position="6"> Instead of using the concept of an LCP as in classical GL, we assign probabilities to each sense encountered. These probabilities can then facilitate priming.</Paragraph> <Paragraph position="7"> To add weight to the argument with empirical data, we use WordNet (Fellbaum, 1998), built on the British National Corpus, as an approximation for frequency counts. We find that</Paragraph> <Paragraph position="9"> Similarly, for the notion of Gorbachev, it is plausible to assume that Gorbachev as head of government is more meaningful for most of us, rather than just another old man. In order to make an inference after the first sentence, we need to search for the correct interpretation, i.e. we need to find argmaxi,j(P(supporti/Gorbachevj)), which intuitively should be P(supportabs/head of govt). Making a similar analysis as in the previous section, the second sentence should violate the first assumption, since walk(Gorbachev) cannot be true (since P(abstract entity) = 0).</Paragraph> <Paragraph position="10"> Thus, we need to revise our inference, moving back to the first sentence and choosing max(P(supporti/Gorbachevj)) that is compatible with the second sentence. This turns out to be P(supportphy/animate). Thus, the distinct shift between inferences is captured in the course of analysis. Cognitive studies such as the studies on Garden Path Sentences strengthen this approach to analysis. (Lewis, 1996), for example, presents a model that predicts cognitive observations with very limited working memory.</Paragraph> <Paragraph position="11"> Storing the inter-lexical conditional probabilities is also an issue, as mentioned earlier. Where, for example, do we store P(supporti/Gorbachevj)? One possible approach would be to store them with either lexical item. A better approach would be to bestow the responsibility of calculating these probabilities upon thegenerativemechanismsofthesemanticlexicon whenever possible.</Paragraph> <Paragraph position="12"> Let us now analyze joke (1) under the probabilistic framework. Again, approximations for probability of occurrence will be taken from WordNet. The entry for wife in WordNet lists just one sense, and so we assign a probability of 1 to it in its lexical entry:</Paragraph> <Paragraph position="14"> The humour is generated due to the lexical ambiguity of miss. We list the lexical entries of the two senses of miss that apply in this context - the firstbeinganabstractemotionalstateandtheother being a physical process.</Paragraph> <Paragraph position="16"> The Rhetorical Relations for joke (1) are presented in Figure 4. After parsing the first sentence, the logical representation obtained is: [?]e1[?]e2[?]e3[?]x[?]y(wife(e1,x,y) [?] divorce(e2,x,y)[?]miss(e3,x,y)[?]e1 < e2 < e3) To arrive at a prime inference, note that the semantic types of the arguments of both senses of miss are exclusive, and hence</Paragraph> <Paragraph position="18"> Theorem, to compare P(missabs/entity) and P(missphy/physical entity), it is sufficient to compare P(missabs) and P(missphy). From</Paragraph> <Paragraph position="20"> Thus, the primed inference has miss = missabs. The second sentence has the following logical representation: [?]x(dgoodness(aim(x)) > 0) This simply means that a measure of the aim, called goodness, is undergoing a positive change. The word but is a cue for a Contrast relation between the two sentences, while the discourse suggests Parallelism. The two senses of aim compatible with the first sentence are aimabs, which is synonymous to goal, and aimphy, referring to the physical sense of missing. We now need to consider P(aimabs/missabs) and P(aimphy/missphy). The semantic constraints of the rhetorical relation Contrast ensures that the second is more coherent, i.e. it is more probable that the contrast of physical aim getting better is more coherent with the physical sense of miss, and we expect this to be reflected in usage frequency as well. Therefore P(aimabs/missabs) < P(aimphy/missphy), and we need to shift our inference and make miss = missphy.</Paragraph> <Paragraph position="21"> As a final assertion of the probabilistic approach, consider: (5) You can lead a child to college, but you cannot make him think.</Paragraph> <Paragraph position="22"> The incongruity in joke (5) does not result from a syntactical or semantic ambiguity at all, and yet it induces dissonance. The dissonance is not a result of compositionality, but due to the access of a whole linguistic structure, i.e. we recall the familiar proverb 'You can lead a horse to water but you cannot make it drink', and the deviation from the recognizable structure causes the violation of our expectations. Thus, access is not restricted to the lexical level; we seem to store and accessbiggerunitsofdiscourseifencounteredfrequently enough. The only way to do justice to this joke would be to encode the entire sentential structure directly into the lexicon. Our model will now also consider these larger chunks, whose meaning is specified atomically. The dissonance will now come from the semantic difference between the accessed expression and the one under analysis.</Paragraph> </Section> class="xml-element"></Paper>