File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1208_intro.xml

Size: 9,581 bytes

Last Modified: 2025-10-06 14:03:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1208">
  <Title>A Probabilistic Setting and Lexical Cooccurrence Model for Textual Entailment</Title>
  <Section position="3" start_page="43" end_page="44" type="intro">
    <SectionTitle>
2 Probabilistic Textual Entailment
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="43" end_page="43" type="sub_section">
      <SectionTitle>
2.1 Motivation
</SectionTitle>
      <Paragraph position="0"> A common definition of entailment in formal semantics (Chierchia. and McConnell-Ginet, 1990) specifies that a text t entails another text h (hypothesis, in our terminology) if h is true in every circumstance (possible world) in which t is true.</Paragraph>
      <Paragraph position="1"> For example, in examples 1 and 3 from Table 1 we'd assume humans to agree that the hypothesis is necessarily true in any circumstance for which the text is true. In such intuitive cases, textual entailment may be perceived as being certain, or, taking a probabilistic perspective, as having a probability of 1.</Paragraph>
      <Paragraph position="2"> In many other cases, though, entailment inference is uncertain and has a probabilistic nature. In example 2, the text doesn't contain enough information to infer the hypothesis' truth. And in example 4, the meaning of the word hometown is ambiguous and therefore one cannot infer for certain that the hypothesis is true. In both of these cases there are conceivable circumstances for which the text is true and the hypothesis false. Yet, it is clear that in both examples, the text does increase substantially the likelihood of the correctness of the hypothesis, which naturally extends the classical notion of certain entailment. Given the text, we expect the probability that the hypothesis is indeed true to be relatively high, and significantly higher than its probability of being true without reading the text. Aiming to model application needs, we suggest that the probability of the hypothesis being true given the text reflects an appropriate confidence score for the correctness of a particular textual inference. In the next sub-sections we propose a concrete probabilistic setting that formalizes the notion of truth probabilities in such cases.</Paragraph>
    </Section>
    <Section position="2" start_page="43" end_page="43" type="sub_section">
      <SectionTitle>
2.2 A Probabilistic Setting
</SectionTitle>
      <Paragraph position="0"> Let T denote a space of possible texts, and t[?]T a specific text. Let H denote the set of all possible hypotheses. A hypothesis h[?]H is a propositional statement which can be assigned a truth value. For now it is assumed that h is represented as a textual statement, but in principle it could also be expressed as a formula in some propositional language. null A semantic state of affairs is captured by a mapping from H to {0=false, 1=true}, denoted by w: H - {0, 1} (called here possible world, following common terminology). A possible world w represents a concrete set of truth value assignments for all possible propositions. Accordingly, W denotes the set of all possible worlds.</Paragraph>
    </Section>
    <Section position="3" start_page="43" end_page="44" type="sub_section">
      <SectionTitle>
2.2.1 A Generative Model
</SectionTitle>
      <Paragraph position="0"> We assume a probabilistic generative model for texts and possible worlds. In particular, we assume that texts are generated along with a concrete state of affairs, represented by a possible world. Thus, whenever the source generates a text t, it generates also corresponding hidden truth assignments that constitute a possible world w.</Paragraph>
      <Paragraph position="1"> The probability distribution of the source, over all possible texts and truth assignments T x W, is assumed to reflect inferences that are based on the generated texts. That is, we assume that the distribution of truth assignments is not bound to reflect the state of affairs in a particular &amp;quot;real&amp;quot; world, but only the inferences about propositions' truth which are related to the text. In particular, the probability for generating a true hypothesis h that is not related at all to the corresponding text is determined by some prior probability P(h). For example, h=&amp;quot;Paris is the capital of France&amp;quot; might have a prior smaller than 1 and might well be false when the generated text is not related at all to Paris or France. In fact, we may as well assume that the notion of textual entailment is relevant only for hypotheses for which P(h) &lt; 1, as otherwise (i.e. for tautologies) there is no need to consider texts that would support h's truth. On the other hand, we assume that the probability of h being true (generated within w) would be higher than the prior when the corresponding t does contribute information that supports h's truth.</Paragraph>
      <Paragraph position="2"> example text hypothesis  We define two types of events over the probability space for T x W: I) For a hypothesis h, we denote as Trh the random variable whose value is the truth value assigned to h in a given world. Correspondingly, Trh=1 is the event of h being assigned a truth value of 1 (true).</Paragraph>
      <Paragraph position="3"> II) For a text t, we use t itself to denote also the event that the generated text is t (as usual, it is clear from the context whether t denotes the text or the corresponding event).</Paragraph>
    </Section>
    <Section position="4" start_page="44" end_page="44" type="sub_section">
      <SectionTitle>
2.3 Probabilistic textual entailment
</SectionTitle>
      <Paragraph position="0"> definition We say that a text t probabilistically entails a hypothesis h (denoted as t = h) if t increases the likelihood of h being true, that is, if P(Trh = 1 |t) &gt; P(Trh = 1) or equivalently if the pointwise mutual information, I(Trh=1,t), is greater then 0. Once knowing that t=h, P(Trh=1 |t) serves as a probabilistic confidence value for h being true given t. Application settings would typically require that P(Trh = 1 |t) obtains a high value; otherwise, the text would not be considered sufficiently relevant to support h's truth (e.g. a supporting text in QA or IE should entail the extracted information with high confidence). Finally, we ignore here the case in which t contributes negative information about h, leaving this relevant case for further investigation. null</Paragraph>
    </Section>
    <Section position="5" start_page="44" end_page="44" type="sub_section">
      <SectionTitle>
2.4 Model Properties
</SectionTitle>
      <Paragraph position="0"> It is interesting to notice the following properties and implications of our model: A) Textual entailment is defined as a relationship between texts and propositions whose representation is typically based on text as well, unlike logical entailment which is a relationship between propositions only. Accordingly, textual entailment confidence is conditioned on the actual generation of a text, rather than its truth. For illustration, we would expect that the text &amp;quot;His father was born in Italy&amp;quot; would logically entail the hypothesis &amp;quot;He was born in Italy&amp;quot; with high probability - since most people who's father was born in Italy were also born there. However we expect that the text would actually not probabilistically textually entail the hypothesis since most people for whom it is specifically reported that their father was born in Italy were not born in Italy.1 B) We assign probabilities to propositions (hypotheses) in a similar manner to certain probabilistic reasoning approaches (e.g. Bacchus, 1990; Halpern, 1990). However, we also assume a generative model of text, similar to probabilistic language and machine translation models, which supplies the needed conditional probability distribution. Furthermore, since our conditioning is on texts rather than propositions we do not assume any specific logic representation language for text meaning, and only assume that textual hypotheses can be assigned truth values.</Paragraph>
      <Paragraph position="1"> C) Our framework does not distinguish between textual entailment inferences that are based on knowledge of language semantics (such as murdering = killing) and inferences based on domain or world knowledge (such as live in Paris = live in France). Both are needed in applications and it is not clear at this stage where and how to put such a borderline.</Paragraph>
      <Paragraph position="2"> D) An important feature of the proposed framework is that for a given text many hypotheses are likely to be true. Consequently, for a given text t and hypothesis h, [?]hP(Trh=1|t) does not sum to 1.</Paragraph>
      <Paragraph position="3"> This differs from typical generative settings for IR and MT (Ponte and croft, 1998; Brown et al., 1993), where all conditioned events are disjoint by construction. In the proposed model, it is rather the case that P(Trh=1|t) + P(Trh=0|t) = 1, as we are interested in the probability that a single particular hypothesis is true (or false).</Paragraph>
      <Paragraph position="4"> E) An implemented model that corresponds to our probabilistic setting is expected to produce an estimate for P(Trh = 1 |t). This estimate is expected to reflect all probabilistic aspects involved in the modeling, including inherent uncertainty of the entailment inference itself (as in example 2 of Table 1), possible uncertainty regarding the correct disambiguation of the text (example 4), as well as probabilistic estimates that stem from the particular model structure.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML