XML Viewer - p97-1018

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1018_metho.xml
Size: 17,655 bytes
Last Modified: 2025-10-06 14:14:38
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1018">
  <Title>Integrating Symbolic and Statistical Representations: The Lexicon Pragmatics Interface</Title>
  <Section position="4" start_page="138" end_page="138" type="metho">
    <SectionTitle>
3 Encoding Lexical Preferences
</SectionTitle>
    <Paragraph position="0"> In order to help pragmatics select between the multipie possible interpretations, we utilise probabilities.</Paragraph>
    <Paragraph position="1"> For an established form, derived or not, these depend straightforwardly on the frequency of a particular sense. For example, in the BNC, diet has probability of about 0.9 of occurring in the food sense and 0.005 in the legislature sense (the remainder are metaphorical extensions, e.g.. diet of crime). Smoothing is necessary to avoid giving a non-zero probability for possible senses which are not found in a particular corpus. For derived forms, the applicable lexical rules or schemata determine possible senses (Briscoe and Copestake, 1996). Thus for known compounds, probabilities of established senses depend on corpus frequencies but a residual probability is distributed between unseen interpretations licensed by schemata, to allow for novel uses. This distribution is weighted to allow for productivit3&amp;quot; differences between schemata. For unseen compounds, all probabilities depend on schema productivity. Compound schemata range from the non-productive (e.g., the verb-noun pattern exemplified by pickpocket), to the almost fully productive (e.g.; made-of) with many schemata being intermediate (e.g., has-part: ~-door car is acceptable but the apparently similar *sunroof car is not).</Paragraph>
    <Paragraph position="2"> We use the following estimate for productivity (adapted from Briscoe and Copestake (1996)):</Paragraph>
  </Section>
  <Section position="5" start_page="138" end_page="139" type="metho">
    <SectionTitle>
M+I Prod(cmp-schema) - N
</SectionTitle>
    <Paragraph position="0"> (where N is the number of pairs of senses which match the schema input and M is the number of attested two-noun output forms -- we ignore compounds with more than two nouns for simplicity). Formulae for calculating the unseen probability mass and for allocating it differentially according to schema productivity are shown in Figure 4. Finergrained, more accurate productivity estimates can be obtained by considering subsets of the possible inputs -- this allows for some real-world effects (e.g., the made-of schema is unlikely for liquid/physicalartifact compounds).</Paragraph>
    <Paragraph position="1"> Lexical probabilities should be combined to give an overall probability for a logical form (LF): see e.g., Resnik (1992). But we will ignore this here and assume pragmatics has to distinguish between alternatives which differ only in the sense assigned to one compound. (2) shows possible interpretations for cotton bag with associated probabilities. LFS are encoded in DRT. The probabilities given here are based on productivity figures for fabric/container compounds in the BNC, using WordNet as a source of semantic categories. Pragmatics screens the LFS for acceptability. If a LF contains an underspecified ele- null ment (e.g., arising from general-nn), this must be instantiated by pragmatics from the discourse context. null (2) a.</Paragraph>
    <Paragraph position="2"> b.</Paragraph>
    <Paragraph position="3"> Mary put a skirt in a cotton bag</Paragraph>
    <Paragraph position="5"/>
  </Section>
  <Section position="6" start_page="139" end_page="140" type="metho">
    <SectionTitle>
4 SDRT and the Resolution of
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="139" end_page="140" type="sub_section">
      <SectionTitle>
Underspecified Relations
</SectionTitle>
      <Paragraph position="0"> The frequency information discussed in SS3 is insufficient on its own for disambiguating compounds.</Paragraph>
      <Paragraph position="1"> Compounds like apple juice seat require marked contexts to be interpretable. And some discourse contexts favour interpretations associated with less frequent senses. In particular, if the context makes the usual meaning of a compound incoherent, then pragmatics should resolve the compound to a less frequent but conventionally licensed meaning, so long as this improves coherence. This underlies the dis- null tinct interpretations of cotton bag in (3) vs. (4): (3) a. Mary sorted her clothes into various large bags.</Paragraph>
      <Paragraph position="2"> b. She put her skirt in the cotton bag.</Paragraph>
      <Paragraph position="3"> (4) a. Mary sorted her clothes into various bags made from plastic.</Paragraph>
      <Paragraph position="4"> b. She put her skirt into the cotton bag.</Paragraph>
      <Paragraph position="5">  If the bag in (4b) were interpreted as being made of cotton--in line with the (statistically) most frequent sense of the compound--then the discourse becomes incoherent because the definite description cannot be accommodated into the discourse context. Instead, it must be interpreted as having the (less frequent) sense given by purposepatient; this allows the definite description to be accommodated and the discourse is coherent.</Paragraph>
      <Paragraph position="6"> In this section, we'll give a brief overview of the theory of discourse and pragmatics that we'll use for modelling this interaction during disambiguation between discourse information and lexical frequencies. We'll use Segmented Discourse Representation Theory (SDRT) (e.g., Asher (1993)) and the accompanying pragmatic component Discourse in Commonsense Entaihnent (DICE) (Lascarides and Asher. 1993). This framework has already been successful in accounting for other phenomena on the interface between the lexicon and pragmatics, e.g.. Asher and Lascarides (1995). Lascarides and Copestake (1995).</Paragraph>
      <Paragraph position="7"> Lascarides, Copestake and Briscoe (1996).</Paragraph>
      <Paragraph position="8"> SDRT is an extension of DRT (Kamp and Reyle, 1993). where discourse is represented as a recursive set of DRSS representing the clauses, linked together with rhetorical relations such as Elaboration and Contrast. cf. Hobbs (1985). Polanyi (1985). Building an SDRS invoh'es computing a rhetorical relation between the representation of the current clause and the SDRS built so far. DICE specifies how various background knowledge resources interact to provide clues about which rhetorical relation holds.</Paragraph>
      <Paragraph position="9"> The rules in DICE include default conditions of the form P &gt; Q, which means If P, then normally Q. For example, Elaboration states: if 2 is to be attached to a with a rhetorical relation, where a is part of the discourse structure r already (i.e., (r, a, 2) holds). and 3 is a subtype of a--which by Subtype means that o's event is a subtype of 8's, and the individual filling some role Oi in 3 is a subtype of the one filling the same role in a--then normally, a and 2 are attached together with Elaboration (Asher and Lascarides, 1995). The Coherence Constraint on Elaboration states that an elaborating event must be temporally included in the elaborated event.</Paragraph>
      <Paragraph position="11"> Subtype and Elaboration encapsulate clues about rhetorical structure given by knowledge of subtype relations among events and objects. Coherence Constraint on Elaboration constrains the semantic content of constituents connected by Elaboration in coherent discourse.</Paragraph>
      <Paragraph position="12"> A distinctive feature of SDRT is that if the DICE axioms yield a nonmonotonic conclusion that the discourse relation is R, and information that's necessary for the coherence of R isn't already in the constituents connected with R (e.g., Elaboration(a, 8) is nonmonotonically inferred, but e3 C_ eo is not in a or in 3). then this content can be added to the constituents in a constrained manner through a process known as SDRS Update. Informally. Update( r, a. 3) is an SDRS, which includes (a) the discourse context r, plus (b) the new information '3. and (c) an attachment of S to a (which is part of r) with a rhetorical relation R that's computed via DICE, where (d) the content of v. a and 3 are modified so that the coherence constraints on R are met. 3 Note that this is more complex than DRT:s notion of update. Update models how interpreters are allowed and expected to fill in certain gaps in what the speaker says: in essence affecting semantic canter through context and pragmatics, lVe'll use this information  flow between context and semantic content to reason about the semantic content of compounds in discourse: simply put, we will ensure that words are assigned the most freqent possible sense that produces a well defined SDRS Update function.</Paragraph>
      <Paragraph position="13"> An SDnS S is well-defined (written 4 S) if there are no conditions of the form x =? (i.e., there are no um'esoh'ed anaphoric elements), and every constituent is attached with a rhetorical relation. A discourse is incoherent if &amp;quot;~ 3, Update(T, a,/3) holds for every available attachment point a in ~-. That is. anaphora can't be resolved, or no rhetorical connection can be computed via DICE.</Paragraph>
      <Paragraph position="14"> For example, the representm ions of (Sa.b) (in sireplified form) are respectively a and t3:  (5) a. Mary put her clothes into various large bags.</Paragraph>
      <Paragraph position="15"> x. ~ &amp;quot;. Z, e,~. to. u o. mary(x), clothes(Y), bag(Z).</Paragraph>
      <Paragraph position="16"> put(eo,x,~'. Z). hold(e,,,ta), ta &amp;quot;&lt; n b. She put her skirt into the bag made out of cotton.</Paragraph>
      <Paragraph position="17"> x.y.z,w, e3.t2.n.u.B mary(x), skirt(y)~ bag(z), cotton(w), 3. made-of(z, w), u =?, B(u, z). B =?,  put(e~,x,y,z), hold(e2,to), t~ -&lt; n In words, the conditions in '3 require the object denoted by the definite description to be linked by some 'bridging' relation B (possibly identity, cf. van der Sandt (1992)) to an object v identified in the discourse context (Asher and Lascarides. 1996). In SDRT. the values of u and B are computed as a byproduct of SDRT'5 Update function (cf. Hobbs (1979)); one specifies v and B by inferring the relevant new semantic content arising from R~s coherence constraints, where R is the rhetorical relation inferred via the DICE axioms. If one cannot resoh'e the conditions u =? or B =? through SDnS upda~e. then by the above definition of well-definedness on SDRSS the discourse is incoherent (and we have presupposition failure).</Paragraph>
      <Paragraph position="18"> The detailed analysis of (3) and (52) involve reasoning about the values of v and B. But for reasons of space, we gloss over the details given in Asher and Lascarides (1996) for specifying u and B through the SDRT update procedure. However. the axiom Assume Coherence below is derivable from the axioms given there. First some notation: let 3\[C\] mean that ~ contains condition C. a~d assume that 3\[C/C'\] stands for the SDRS which is the same as 3. save that the condition C in 3 is replaced by C'. Then in words, Assume Coherence stipulates that if the discourse can be coherent only if the anaphor u is resolved to x and B is resolved to the specific relation P, then one monotonically assumes that they are resoh,ed this way:  Intuitively, it should be clear that in (Sa.b) -, $ Update(a, a, 3) holds, unless the bag in (5b) is one of the bags mentioned in (5a)--i.e, u = Z and B = member-of For otherwise the events in (5) are too &amp;quot;disconnected&amp;quot; to support ant&amp;quot; rhetorical relation. On the other hand. assigning u and B these values allows us to use Subtype and Elaboration to infer Elaboration (because skirt is a kind of clothing, and the bag in (Sb) is one of the bags in (5a)). So Assume Coherence, Subtype and Elaboration yield that (Sb) elaborates (Sa) and the bag in (5b) is one of the bags in (5a).</Paragraph>
      <Paragraph position="19"> Applying SDRT tO compounds encodes the effects of pragmatics on the compounding relation. For example, to reflect the fact that compounds such as apple juice seat, which are compatible only with general-nn, are acceptable only when context resoh'es the compound relation, we assume that the DRS conditions produced by this schema are: Rc(y,x), Rc -.,-7 and -,(made-o/(y.x) V contain(y, x) V...). By the above definition of well-definedness on SDRSS, the compound is coherent only if we can resoh,e Rc to a particular relation via the SDRT Update function, which in turn is determined by DICE. Rules such as Assume Coherence serve to specify the necessary compound relation, so long as context provides enough information.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="140" end_page="141" type="metho">
    <SectionTitle>
5 Integrating Lexical Preferences
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="140" end_page="141" type="sub_section">
      <SectionTitle>
and Pragmatics
</SectionTitle>
      <Paragraph position="0"> \Ve now extend SDRT and DICE to handle the probabilistic information given in SS3. We want the pragmatic component to utilise this knowledge, while still maintaining sufficient flexibility that less frequent senses are favoured in certain discourse contexts. null Suppose that the new information to be integrated with the discourse context is ambiguous between ~1 .... ,Bn. Then we assume that exactly one of Update(z.a,~,). \] &lt; i &lt;_ n.</Paragraph>
      <Paragraph position="1"> holds. We gloss this complex disjunctive formula as  /Vl&lt;i&lt;n(Update(T,a, j3i)). Let ~k ~- j3j mean that the probability of DRS f~k is greater than that of f~j. Then the rule schema below ensures that the most frequent possible sense that produces discourse coherence is (monotonically) favoured:</Paragraph>
      <Paragraph position="3"> Prefer Frequent Senses is a declarative rule for disambiguating constituents in a discourse context.</Paragraph>
      <Paragraph position="4"> But from a procedural perspective it captures: try to attach the DRS based on the most probable senses first; if it works you're done; if not, try the next most probable sense, and so on.</Paragraph>
      <Paragraph position="5"> Let's examine the interpretation of compounds.</Paragraph>
      <Paragraph position="6"> Consider (3). Let's consider the representation ~' of (3b) with the highest probability: i.e., the one where cotton bag means bag made of cotton. Then similarly to (5), Assume Coherence, Subtype and Elaboration are used to infer that the cotton bag is one of the bags mentioned in (3a) and Elaboration holds. Since this updated SDRS is welldefined, Prefer Frequent Senses ensures that it's true. And so cotton bag means bag made from cotton in this context.</Paragraph>
      <Paragraph position="7"> Contrast this with (4). Update( a, a, /~') is not well-defined because the cotton bag cannot be one of the bags in (4a). On the other hand, Update(a, (~, ~&amp;quot;) is well-defined, where t3&amp;quot; is the DRS where cotton bag means bag containing cotton. This is because one can now assume this bag is one of the bags mentioned in (4a), and therefore Elaboration can be inferred as before. So Prefer Frequent Senses ensures that Update(a,a,~&amp;quot;) holds but Update(a, o~, j3') does not.</Paragraph>
      <Paragraph position="8"> Prefer Frequent Senses is designed for reasoning about word senses in general, and not just the semantic content of compounds: it predicts diet has its food sense in (6b) in isolation of the discourse context (assuming Update(O, 0, ~) = ~), but it has the law-maker sense in (6), because SDRT's coherence constraints on Contrast ((Asher, 1993))--which is the relation required for Update because of the cue word but--can't be met when diet means food.</Paragraph>
      <Paragraph position="9"> (6) a. In theory, there should be cooperation between the different branches of government.</Paragraph>
      <Paragraph position="10"> b. But the president hates the diet.</Paragraph>
      <Paragraph position="11"> In general, pragmatic reasoning is computationally expensive, even in very restricted domains. But the account of disambiguation we've offered circumscribes pragmatic reasoning as much as possible. All nonmonotonic reasoning remains packed into the definition of Update(T, a, f~), where one needs pragmatic reasoning anyway for inferring rhetorical relations. Prefer Frequent Senses is a monotonic rule, it doesn't increase the load on nonmonotonic reasoning, and it doesn't introduce extra pragmatic machinery peculiar to the task of disambiguating word senses. Indeed, this rule offers a way of checking whether fully specified relations between compounds are acceptable, rather than relying on (expensive) pragmatics to compute them.</Paragraph>
      <Paragraph position="12"> We have mixed stochastic and symbolic reasoning.</Paragraph>
      <Paragraph position="13"> Hobbs et al (1993) also mix numbers and rules by means of weighted abduction. However, the theories differ in several important respects. First, our pragmatic component has no access to word forms and syntax (and so it's not language specific), whereas Hobbs et al's rules for pragmatic interpretation can access these knowledge sources. Second, our probabilities encode the frequency of word senses associated with word forms. In contrast, the weights that guide abduction correspond to a wider variety of information, and do not necessarily correspond to word sense/form frequencies. Indeed, it is unclear what meaning is conveyed by the weights, and consequently the means by which they can be computed are not well understood.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML