File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0906_intro.xml

Size: 8,461 bytes

Last Modified: 2025-10-06 14:01:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0906">
  <Title>Entailment, Intensionality and Text Understanding</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Entailment and Contradiction Metrics
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Theoretical Justification
</SectionTitle>
      <Paragraph position="0"> The ability to recognize entailment and contradiction relations is a consequence of language understanding, as examples (1)-(2) show. But before concluding that entailment and contradiction detection is a suitable evaluation metric for text understanding, two cautionary points should be addressed. First, it cannot be a sufficient metric, since there is more to understanding than entailment and contradiction, and we should ask what aspects of understanding it does not evaluate. Second, we need to be reasonably sure that it is a necessary metric, and does not measure some merely accidental manifestation of understanding. To give an analogy, clearing up spots is a consequence of curing infections like measles; but clearing spots is a poor metric, especially if success can be achieved by bleaching spots off the skin or covering them with make-up. A measles-cure metric should address the presence of the infection, and not just its symptoms.</Paragraph>
      <Paragraph position="1"> In terms of (in)sufficicency, we should note that understanding a text implies two abilities. (i) You can relate the text to the world, and know what the world would have to be like if the text were true or if you followed instructions contained in it.1 (ii) You can relate the text to other texts, and can tell where texts agree or disagree in what they say. Clearly, entailment and contradiction detection directly measures only the second ability.</Paragraph>
      <Paragraph position="2"> In terms of necessity, there are two points to be made.</Paragraph>
      <Paragraph position="3"> The first is simply an appeal to intuition. Given a pre-theoretical grasp of what language understanding is, the ability to draw inferences and detect entailments and contradictions just does seem to be part of understanding, and not merely an accidental symptom of it. The second point is more technical. Suppose we assume the standard machinery of modern logic, linking proof theory and model theory. Then a proof-theoretic ability to detect entailments and contradictions between expressions is intrinsically linked to a model-theoretic ability to relate those expressions to (abstract) models of the world. In other words, the abilities to relate texts to texts and texts to the world are connected, and there are at least some approaches that show how success in the former feeds into success in the latter.</Paragraph>
      <Paragraph position="4"> The reference to logic and in particular to model theory is deliberate. It provides an arsenal of tools for dealing with entailment and contradiction, and there is also a large body of work in formal semantics linking natural language to these tools. One should at least consider making use of these resources. However, it is important not to characterize entailment and contradiction so narrowly as to preclude other methods. There needs to be room for probabilistic / Bayesian notions of inference, e.g. (Pearl, 1991), as well as attempting to use corpus based methods to detect entailment / subsumption, e.g. the use of TF-IDF by (Monz and de Rijke, 2001). That is, one can agree on the importance of entailment and contradiction detection as an evaluation mertic, while disagreeing on the best methods for achieving success.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Practical Justification
</SectionTitle>
      <Paragraph position="0"> Even if we grant that entailment and contradiction detection (ECD) measures a core aspect of language understanding, it does not follow that it measures a useful aspect of understanding. However, we can point to at least two application areas that directly demonstrate the utility of the metric.</Paragraph>
      <Paragraph position="1"> The first is an application that we are actually work1Knowing what the world would be like if the text were true is not the same as being able to tell if the text is true. I know how things would have to be for it to be true that &amp;quot;There is no greatest pair of prime numbers, a0a2a1 anda0a4a3 , such thata0a4a3a6a5a7a0a2a1a9a8a11a10 .&amp;quot; But I have no idea how to tell whether this is true or not.</Paragraph>
      <Paragraph position="2"> ing on, concerning quality maintenance for document collections. The Eureka system includes a large textual database containing engineer-authored documents (tips) about the repair and maintenance of printers and photocopiers. Over time, duplicate and inconsistent material builds up, undermining the utility of the database to field engineers. Human validators who maintain the quality of the document collection would benefit from ECD text analysis tools that locate points of contradiction and entailment between different but related tips in the database.</Paragraph>
      <Paragraph position="3"> A second application building fairly directly on ECD would be yes-no question answering. Positive or negative answers to yes-no questions can be characterized as those that (respectively) entail or contradict a declarative form of the query. Yes-no question answering would be useful for autonomous systems that attempt to interpret and act on information acquired from textual sources, rather than merely pre-filtering it for human interpretation and action.</Paragraph>
      <Paragraph position="4"> Despite its relevance to applications like the above, one of the advantages of ECD is a degree of task neutrality.</Paragraph>
      <Paragraph position="5"> Entailment and contradiction relations can be characterized independently of the use, if any, to which they are put. Many other reasonable metrics for language understanding are not so task neutral. For example, in a dialogue system one measure of understanding would be success in taking a (task) appropriate action or making an appropriate response. However, it can be non-trivial to determine how much of this success is due to language understanding and how much due to prior understanding of the task: a good, highly constraining task model can overcome many deficiencies in language processing.</Paragraph>
      <Paragraph position="6"> Task neutrality is not the same as domain or genre neutrality. ECD can depend on domain knowledge. For example, if I do not know that belladonna and deadly nightshade name the same plant, I will not recognize that an instruction to uproot belladonna entails an instruction to uproot deadly nightshade. But this is arguably a failure of botanical knowledge, not a lapse in language understanding. We will return to the issue of domain dependence later. However, there are many instances where ECD does not depend on domain knowledge, e.g. (1)-(2) or (3)-(4) (taken, with simplifications, from the Eureka corpus).</Paragraph>
      <Paragraph position="7">  (3) Corrosion caused intermittent electrical contact.</Paragraph>
      <Paragraph position="8"> (4) Corrosion prevented continuous electrical contact.</Paragraph>
      <Paragraph position="9">  One does not need to be an electrician to recognize the potential equivalence of (3) and (4); merely that intermittent means non-continuous, so that causing something to be intermittent can be the same as preventing it from being continuous. And even in cases where domain knowledge is required, ECD is still also reliant on linguistic knowledge of this kind.</Paragraph>
      <Paragraph position="10"> The success of methods for ECD may also depend on genre. For newswire stories (Monz and de Rijke, 2001) reports that TF-IDF performs well in detecting subsumption (i.e. entailment) between texts. This may be a consequence of the way that newswires convey generally consistent information about particular individuals and events: reference to the same entities is highly correlated with subsumption in such a genre. The use of PLSA on the Eureka corpus (Brants and Stolle, 2002) was less successful: the corpus has less reference to concrete events and individuals, and contains conflicting diagnoses and recommendations for repair actions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML