File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1019_metho.xml

Size: 20,123 bytes

Last Modified: 2025-10-06 14:09:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1019">
  <Title>Modelling the substitutability of discourse connectives</Title>
  <Section position="4" start_page="149" end_page="151" type="metho">
    <SectionTitle>
2 Relationships between connectives
</SectionTitle>
    <Paragraph position="0"> Two types of relationships between connectives are of interest: similarity and substitutability.</Paragraph>
    <Section position="1" start_page="149" end_page="149" type="sub_section">
      <SectionTitle>
2.1 Similarity
</SectionTitle>
      <Paragraph position="0"> The concept of lexical similarity occupies an important role in psychology, artificial intelligence, and computational linguistics. For example, in psychology, Miller and Charles (1991) report that psychologists 'have largely abandoned &amp;quot;synonymy&amp;quot; in favour of &amp;quot;semantic similarity&amp;quot;.' In addition, work in automatic lexical acquisition is based on the proposition that distributional similarity correlates with semantic similarity (Grefenstette, 1994; Curran and Moens, 2002; Weeds and Weir, 2003).</Paragraph>
      <Paragraph position="1"> Several studies have found subjects' judgements of semantic similarity to be robust. For example, Miller and Charles (1991) elicit similarity judgements for 30 pairs of nouns such as cord-smile, and found a high correlation with judgements of the same data obtained over 25 years previously (Rubenstein and Goodenough, 1965). Resnik (1999) repeated the experiment, and calculated an inter-rater agreement of 0.90.</Paragraph>
      <Paragraph position="2"> Resnik and Diab (2000) also performed a similar experiment with pairs of verbs (e.g. bathe-kneel).</Paragraph>
      <Paragraph position="3"> The level of inter-rater agreement was again signifi- null cant (r = 0.76).</Paragraph>
      <Paragraph position="4"> 1. Take an instance of a discourse connective in a corpus. Imagine you are the writer that produced this text, but that you need to choose an alternative connective.</Paragraph>
      <Paragraph position="5"> 2. Remove the connective from the text, and insert another connective in its place.</Paragraph>
      <Paragraph position="6"> 3. If the new connective achieves the same discourse goals as the original one, it is considered substitutable in this context.</Paragraph>
      <Paragraph position="7">  Given two words, it has been suggested that if words have the similar meanings, then they can be expected to have similar contextual distributions. The studies listed above have also found evidence that similarity ratings correlate positively with the distributional similarity of the lexical items.</Paragraph>
    </Section>
    <Section position="2" start_page="149" end_page="151" type="sub_section">
      <SectionTitle>
2.2 Substitutability
</SectionTitle>
      <Paragraph position="0"> The notion of substitutability has played an important role in theories of lexical relations. A definition of synonymy attributed to Leibniz states that two words are synonyms if one word can be used in place of the other without affecting truth conditions.</Paragraph>
      <Paragraph position="1"> Unlike similarity, the substitutability of discourse connectives has been previously studied.</Paragraph>
      <Paragraph position="2"> Halliday and Hasan (1976) note that in certain contexts otherwise can be paraphrased by if not, as in (1) It's the way I like to go to work.</Paragraph>
      <Paragraph position="3"> One person and one line of enquiry at a time.</Paragraph>
      <Paragraph position="4"> Otherwise/if not, there's a muddle.</Paragraph>
      <Paragraph position="5"> They also suggest some other extended paraphrases of otherwise, such as under other circumstances.</Paragraph>
      <Paragraph position="6"> Knott (1996) systematises the study of the substitutability of discourse connectives. His first step is to propose a Test for Substitutability for connectives, which is summarised in Figure 1. An application of the Test is illustrated by (2). Here seeing as was the connective originally used by the writer, however because can be used instead.</Paragraph>
      <Paragraph position="7">  (2) Seeing as/because we've got nothing but circumstantial evidence, it's going to be difficult to get a conviction. (Knott, p. 177) However the ability to substitute is sensitive to the context. In other contexts, for example (3), the substitution of because for seeing as is not valid.</Paragraph>
      <Paragraph position="8">  (3) It's a fairly good piece of work, seeing as/#because you have been under a lot of pressure recently. (Knott, p. 177) Similarly, there are contexts in which because can be used, but seeing as cannot be substituted for it: (4) That proposal is useful, because/#seeing as it  gives us a fallback position if the negotiations collapse. (Knott, p. 177) Knott's next step is to generalise over all contexts a connective appears in, and to define four substitutability relationships that can hold between a pair of connectives w1 and w2. These relationships are illustrated graphically through the use of Venn diagrams in Figure 2, and defined below.</Paragraph>
      <Paragraph position="9"> * w1 is a SYNONYM of w2 if w1 can always be substituted for w2, and vice versa.</Paragraph>
      <Paragraph position="10"> * w1 and w2 are EXCLUSIVE if neither can ever be substituted for the other.</Paragraph>
      <Paragraph position="11"> * w1 is a HYPONYM of w2 if w2 can always be substituted for w1, but not vice versa.</Paragraph>
      <Paragraph position="12"> * w1 and w2 are CONTINGENTLY SUBSTI-TUTABLE if each can sometimes, but not always, be substituted for the other.</Paragraph>
      <Paragraph position="13"> Given examples (2)-(4) we can conclude that because and seeing as are CONTINGENTLY SUBSTI-TUTABLE (henceforth &amp;quot;CONT. SUBS.&amp;quot;). However this is the only relationship that can be established using a finite number of linguistic examples. The other relationships all involve generalisations over all contexts, and so rely to some degree on the judgement of the analyst. Examples of each relationship given by Knott (1996) include: given that and seeing as are SYNONYMS, on the grounds that is a HYPONYM of because, and because and now that are EXCLUSIVE.</Paragraph>
      <Paragraph position="14"> Although substitutability is inherently a more complex notion than similarity, distributional similarity is expected to be of some use in predicting substitutability relationships. For example, if two discourse connectives are SYNONYMS then we would expect them to have similar distributions. On the other hand, if two connectives are EXCLUSIVE, then we would expect them to have dissimilar distributions. However if the relationship between two connectives is HYPONYMY or CONT. SUBS. then we expect to have partial overlap between their distributions (consider Figure 2), and so distributional similarity might not distinguish these relationships. The Kullback-Leibler (KL) divergence function is a distributional similarity function that is of particular relevance here since it can be described informally in terms of substitutability. Given co-occurrence distributions p and q, its mathematical definition can be written as:</Paragraph>
      <Paragraph position="16"> The value log 1p(x) has an informal interpretation as a measure of how surprised an observer would be to see event x, given prior likelihood expectations defined by p. Thus, if p and q are the distributions of words w1 and w2 then D(p||q) = Ep(surprise in seeing w2 [?] surprise in seeing w1) (6) where Ep is the expectation function over the distribution of w1 (i.e. p). That is, KL divergence measures how much more surprised we would be, on average, to see word w2 rather than w1, where the averaging is weighted by the distribution of w1.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="151" end_page="152" type="metho">
    <SectionTitle>
3 A variance-based function for
</SectionTitle>
    <Paragraph position="0"> distributional analysis A distributional similarity function provides only a one-dimensional comparison of two distributions, namely how similar they are. However we can obtain an additional perspective by using a variance-based function. We now introduce a new function V by taking the variance of the surprise in seeing w2, over the contexts in which w1 appears:</Paragraph>
    <Paragraph position="2"> Note that like KL divergence, V (p,q) is asymmetric.</Paragraph>
    <Paragraph position="3"> We now consider how the substitutability of connectives affects our expectations of the value of V .</Paragraph>
    <Paragraph position="4"> If two connectives are SYNONYMS then each can always be used in place of other. Thus we would always expect a low level of surprise in seeing one  connective in place of the other, and this low level of surprise is indicated via light shading in Figure 3a. It follows that the variance in surprise is low. On the other hand, if two connectives are EXCLUSIVE then there would always be a high degree of surprise in seeing one in place of the other. This is indicated using dark shading in Figure 3e. Only one set is shaded because we need only consider the contexts in which w1 is appropriate. In this case, the variance in surprise is again low. The situation is more interesting when we consider two connectives that are CONT. SUBS.. In this case substitutability (and hence surprise) is dependent on the context. This is illustrated using light and dark shading in Figure 3d. As a result, the variance in surprise is high. Finally, with HYPONYMY, the variance in surprise depends on whether the original connective was the HYPONYM or the HYPERNYM.</Paragraph>
    <Paragraph position="5"> Table 1 summarises our expectations of the values of KL divergence and V , for the various substitutability relationships. (KL divergence, unlike most similarity functions, is sensitive to the order of arguments related by hyponymy (Lee, 1999).) The  Something happened and something else happened.</Paragraph>
    <Paragraph position="6"> Something happened or something else happened.</Paragraph>
    <Paragraph position="7"> (c) 0 (c) 1 (c) 2 (c) 3 (c) 4 (c) 5  experiments described below test these expectations using empirical data.</Paragraph>
  </Section>
  <Section position="6" start_page="152" end_page="154" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We now describe our empirical experiments which investigate the connections between a) subjects' ratings of the similarity of discourse connectives, b) the substitutability of discourse connectives, and c) KL divergence and the new function V applied to the distributions of connectives. Our motivation is to explore how distributional properties of words might be used to predict substitutability. The experiments are restricted to connectives which relate clauses within a sentence. These include coordinating conjunctions (e.g. but) and a range of subordinators including conjunctions (e.g. because) as well as phrases introducing adverbial clauses (e.g. now that, given that, for the reason that). Adverbial discourse connectives are therefore not considered.</Paragraph>
    <Section position="1" start_page="152" end_page="152" type="sub_section">
      <SectionTitle>
4.1 Experiment 1: Subject ratings of similarity
</SectionTitle>
      <Paragraph position="0"> This experiment tests the hypotheses that 1) subjects agree on the degree of similarity between pairs of discourse connectives, and 2) similarity ratings correlate with the degree of substitutability.</Paragraph>
      <Paragraph position="1">  We randomly selected 48 pairs of discourse connectives such that there were 12 pairs standing in each of the four substitutability relationships.To do this, we used substitutability judgements made by Knott (1996), supplemented with some judgements of our own. Each experimental item consisted of the two discourse connectives along with dummy clauses, as illustrated in Figure 4. The format of the experimental items was designed to indicate how a phrase could be used as a discourse connective (e.g. it may not be obvious to a subject that the phrase the moment is a discourse connective), but without  providing complete semantics for the clauses, which might bias the subjects' ratings. Forty native speakers of English participated in the experiment, which was conducted remotely via the internet.</Paragraph>
      <Paragraph position="2">  Leave-one-out resampling was used to compare each subject's ratings are with the means of their peers' (Weiss and Kulikowski, 1991). The average inter-subject correlation was 0.75 (Min = 0.49, Max = 0.86, StdDev = 0.09), which is comparable to previous results on verb similarity ratings (Resnik and Diab, 2000). The effect of substitutability on similarity ratings can be seen in Table 2. Post-hoc Tukey tests revealed all differences between means in Table 2 to be significant.</Paragraph>
      <Paragraph position="3"> The results demonstrate that subjects' ratings of connective similarity show significant agreement and are robust enough for effects of substitutability to be found.</Paragraph>
    </Section>
    <Section position="2" start_page="152" end_page="153" type="sub_section">
      <SectionTitle>
4.2 Experiment 2: Modelling similarity
</SectionTitle>
      <Paragraph position="0"> This experiment compares subjects' ratings of similarity with lexical co-occurrence data. It hypothesises that similarity ratings correlate with distributional similarity, but that neither correlates with the new variance in surprise function.</Paragraph>
      <Paragraph position="1">  Sentences containing discourse connectives were gathered from the British National Corpus and the world wide web, with discourse connectives identified on the basis of their syntactic contexts (for details, see Hutchinson (2004b)). The mean number of sentences per connective was about 32,000, although about 12% of these are estimated to be errors. From these sentences, lexical co-occurrence data were collected. Only co-occurrences with dis- null course adverbials and other structural discourse connectives were stored, as these had previously been found to be useful for predicting semantic features of connectives (Hutchinson, 2004a).</Paragraph>
      <Paragraph position="2">  A skewed variant of the Kullback-Leibler divergence function was used to compare co-occurrence distributions (Lee, 1999, with a = 0.95). Spearman's correlation coefficient for ranked data showed a significant correlation (r = [?]0.51, p &lt; 0.001). (The correlation is negative because KL divergence is lower when distributions are more similar.) The strength of this correlation is comparable with similar results achieved for verbs (Resnik and Diab, 2000), but not as great as has been observed for nouns (McDonald, 2000). Figure 5 plots the mean similarity judgements against the distributional divergence obtained using discourse markers, and also indicates the substitutability relationship for each item. (Two outliers can be observed in the upper left corner; these were excluded from the calculations.) The &amp;quot;variance in surprise&amp;quot; function introduced in the previous section was applied to the same co-occurrence data.1 These variances were compared to distributional divergence and the subjects' similarity ratings, but in both cases Spearman's correlation coefficient was not significant.</Paragraph>
      <Paragraph position="3"> In combination with the previous experiment, 1In practice, the skewed variant V (p,0.95q + 0.05p) was used, in order to avoid problems arising when q(x) = 0. these results demonstrate a three way correspondence between the human ratings of the similarity of a pair of connectives, their substitutability relationship, and their distributional similarity. Hutchinson (2005) presents further experiments on modelling connective similarity, and discusses their implications. This experiment also provides empirical evidence that the new variance in surprise function is not a measure of similarity.</Paragraph>
    </Section>
    <Section position="3" start_page="153" end_page="154" type="sub_section">
      <SectionTitle>
4.3 Experiment 3: Predicting substitutability
</SectionTitle>
      <Paragraph position="0"> The previous experiments provide hope that substitutability of connectives might be predicted on the basis of their empirical distributions. However one complicating factor is that EXCLUSIVE is by far the most likely relationship, holding between about 70% of pairs. Preliminary experiments showed that the empirical evidence for other relationships was not strong enough to overcome this prior bias.</Paragraph>
      <Paragraph position="1"> We therefore attempted two pseudodisambiguation tasks which eliminated the effects of prior likelihoods. The first task involved distinguishing between the relationships whose connectives subjects rated as most similar, namely SYNONYMY and HY-PONYMY. Triples of connectives &lt;p,q,qprime&gt; were collected such that SYNONYM(p,q) and either HY-PONYM(p,qprime) or HYPONYM(qprime,p) (we were not attempting to predict the order of HYPONYMY). The task was then to decide automatically which of q and qprime is the SYNONYM of p.</Paragraph>
      <Paragraph position="2"> The second task was identical in nature to the first, however here the relationship between p and q was either SYNONYMY or HYPONYMY, while p and qprime were either CONT. SUBS. or EXCLUSIVE. These two sets of relationships are those corresponding to high and low similarity, respectively. In combination, the two tasks are equivalent to predicting SYNONYMY or HYPONYMY from the set of all four relationships, by first distinguishing the high similarity relationships from the other two, and then making a finer-grained distinction between the two.</Paragraph>
      <Paragraph position="3">  Substitutability relationships between 49 structural discourse connectives were extracted from Knott's (1996) classification. In order to obtain more evaluation data, we used Knott's methodology to obtain relationships between an additional 32 connec- null tives. This resulted in 46 triples &lt;p,q,qprime&gt; for the first task, and 10,912 triples for the second task.</Paragraph>
      <Paragraph position="4"> The co-occurrence data from the previous section were re-used. These were used to calculate D(p||q) and V (p,q). Both of these are asymmetric, so for our purposes we took the maximum of applying their arguments in both orders. Recall from Table 1 that when two connectives are in a HYPONYMY relation we expect V to be sensitive to the order in which the connectives are given as arguments. To test this, we also calculated (V (p,q) [?] V (q,p))2, i.e. the square of the difference of applying the arguments to V in both orders. The average values are summarised in Table 3, with D1 and D2 (and V1 and V2) denoting different orderings of the arguments to D (and V ), and max denoting the function which selects the larger of two numbers.</Paragraph>
      <Paragraph position="5"> These statistics show that our theoretically motivated expectations are supported. In particular, (1) SYNONYMOUS connectives have the least distributional divergence and EXCLUSIVE connectives the most, (2) CONT. SUBS. and HYPONYMOUS connectives have the greatest values for V , and (3) V shows the greatest sensitivity to the order of its arguments in the case of HYPONYMY.</Paragraph>
      <Paragraph position="6"> The co-occurrence data were used to construct a Gaussian classifier, by assuming the values for D and V are generated by Gaussians.2 First, normal functions were used to calculate the likelihood ratio of p and q being in the two relationships:</Paragraph>
      <Paragraph position="8"> used to model D, whereas a normal model used for V .</Paragraph>
      <Paragraph position="9"> Input to Gaussian SYN vs SYN/HYP vs  where n(x;u,s) is the normal function with mean u and standard deviation s, and where usyn, for example, denotes the mean of the Gaussian model for SYNONYMY. Next the likelihood ratio for p and q was divided by that for p and qprime. If this value was greater than 1, the model predicted p and q were SYNONYMS, otherwise HYPONYMS. The same technique was used for the second task.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML