File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1208_metho.xml
Size: 10,065 bytes
Last Modified: 2025-10-06 14:10:01
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1208"> <Title>A Probabilistic Setting and Lexical Cooccurrence Model for Textual Entailment</Title> <Section position="4" start_page="44" end_page="46" type="metho"> <SectionTitle> 3 A Lexical Entailment Model </SectionTitle> <Paragraph position="0"> We suggest that the proposed setting above provides the necessary grounding for probabilistic modeling of textual entailment. Since modeling the full extent of the textual entailment problem is clearly a long term research goal, in this paper we rather focus on the above mentioned sub-task of lexical entailment - identifying when the lexical elements of a textual hypothesis h are inferred from a given text t.</Paragraph> <Paragraph position="1"> To model lexical entailment we first assume that the meanings of the individual content words in a hypothesis can be assigned truth values. One possible interpretation for such truth values is that lexical concepts are assigned existential meanings. For example, for a given text t, Trbook=1 if it can be inferred in t's state of affairs that a book exists. Our model does not depend on any such particular interpretation, though, as we only assume that truth values can be assigned for lexical items but do not explicitly annotate or evaluate this sub-task.</Paragraph> <Paragraph position="2"> Given this setting, a hypothesis is assumed to be true if and only if all its lexical components are true as well. This captures our target perspective of lexical entailment, while not modeling here other entailment aspects. When estimating the entailment probability we assume that the truth probability of a term u in a hypothesis h is independent of the truth of the other terms in h, obtaining:</Paragraph> <Paragraph position="4"> In order to estimate P(Tru=1|v1, ..., vn) for a given word u and text t={v1, ..., vn}, we further assume that the majority of the probability mass comes from a specific entailing word in t: )|1(max)|1( vutvu TTrtTr =R==R [?] (2) where Tv denotes the event that a generated text contains the word v. This corresponds to expecting that each word in h will be entailed from a specific word in t (rather than from the accumulative context of t as a whole2). Alternatively, one can view (2) as inducing an alignment between terms in the h to the terms in the t, somewhat similar to alignment models in statistical MT (Brown et al., 1993). Thus we propose estimating the entailment probability based on lexical entailment probabilities from (1) and (2) as follows: [?][?] [?] =R==R hu vutvh TTrtTr )|1(max)|1( (3) 2 Such a model is proposed in (Glickman et al., 2005b)</Paragraph> <Section position="1" start_page="45" end_page="45" type="sub_section"> <SectionTitle> 3.1 Estimating Lexical Entailment Probabilities </SectionTitle> <Paragraph position="0"> We perform unsupervised empirical estimation of the lexical entailment probabilities, P(Tru=1|Tv), based on word co-occurrence frequencies in a corpus. Following our proposed probabilistic model (cf. Section 2.2.1), we assume that the domain corpus is a sample generated by a language source.</Paragraph> <Paragraph position="1"> Each document represents a generated text and a (hidden) possible world. Given that the possible world of the text is not observed we do not know the truth assignments of hypotheses for the observed texts. We therefore further make the simplest assumption that all hypotheses stated verbatim in a document are true and all others are false and hence P(Tru=1|Tv) = P(Tu |Tv). This simple co-occurrence probability, which we denote as lexical entailment probability - lep(u,v), is easily estimated from the corpus based on maximum likelihood counts:</Paragraph> <Paragraph position="3"> where nv is the number of documents containing word v and nu,v is the number of documents containing both u and v.</Paragraph> <Paragraph position="4"> Given our definition of the textual entailment relationship (cf. Section 2.3) for a given word v we only consider for entailment words u for which P(Tru=1|Tv)> P(Tru=1) or based on our estimations, for which nu,v/nu > nv/N (N is total number of documents in the corpus).</Paragraph> <Paragraph position="5"> We denote as tep the textual entailment probability estimation as derived from (3) and (4) above: [?] [?] [?]= hu tv vulephttep ),(max),( (5)</Paragraph> </Section> <Section position="2" start_page="45" end_page="46" type="sub_section"> <SectionTitle> 3.2 Baseline model </SectionTitle> <Paragraph position="0"> As a baseline model for comparison, we use a score developed within the context of text summarization. (Monz and de Rijke, 2001) propose modeling the directional entailment between two texts t1, t2 via the following score:</Paragraph> <Paragraph position="2"> where idf(w) = log(N/nw), N is total number of documents in corpus and nw is number of docu- null ments containing word w. A practically equivalent measure was independently proposed in the context of QA by (Saggion et al., 2004)3. This baseline measure captures word overlap, considering only words that appear in both texts and weighs them based on their inverse document frequency.</Paragraph> </Section> </Section> <Section position="5" start_page="46" end_page="46" type="metho"> <SectionTitle> 4 The RTE challenge dataset </SectionTitle> <Paragraph position="0"> The RTE dataset (Dagan et al., 2005) consists of sentence pairs annotated for entailment. Fo this dataset we used word cooccurrence frequencies obtained from a web search engine. The details of this experiment are described in Glickman et al., 2005a. The resulting accuracy on the test set was 59% and the resulting confidence weighted score was 0.57. Both are statistically significantly better than chance at the 0.01 level. The baseline model (6) from Section 3.2, which takes into account only terms appearing in both the text and hypothesis, achieved an accuracy of only 56%. Although our proposed lexical system is relatively simple, as it doesn't rely on syntactic or other deeper analysis, it nevertheless was among the top ranking systems in the RTE Challenge.</Paragraph> </Section> <Section position="6" start_page="46" end_page="46" type="metho"> <SectionTitle> 5 RCV1 dataset </SectionTitle> <Paragraph position="0"> In addition to the RTE dataset we were interested in evaluating the model on a more representative set of texts and hypotheses that better corresponds to applicative settings. We focused on the information seeking setting, common in applications such as QA and IR, in which a hypothesis is given and it is necessary to identify texts that entail it.</Paragraph> <Paragraph position="1"> An annotator was asked to choose 60 hypotheses based on sentences from the first few documents in the Reuters Corpus Volume 1 (Rose et al., 2002). The annotator was instructed to choose sentential hypotheses such that their truth could easily be evaluated. We further required that the hypotheses convey a reasonable information need in such a way that they might correspond to potential questions, semantic queries or IE relations. Table 2 shows a few of the hypotheses.</Paragraph> <Paragraph position="2"> In order to create a set of candidate entailing texts for the given set of test hypotheses, we followed the common practice of WordNet based ex3 (Saggion et al., 2004) actually proposed the above score with no normalizing denominator. However for a given hypothesis it results with the same ranking of candidate entailing texts. pansion (Nie and Brisebois, 1996; Yang and Chua, 2002). Using WordNet, we expanded the hypotheses' terms with morphological alternations and semantically related words4.</Paragraph> <Paragraph position="3"> For each hypothesis stop words were removed and all content words were expanded as described above. Boolean Search included a conjunction of the disjunction of the term's expansions and was performed at the paragraph level over the full Reuters corpus, as common in IR for QA. Since we wanted to focus our research on semantic variability we excluded from the result set paragraphs that contain all original words of the hypothesis or their morphological derivations. The resulting dataset consists of 50 hypotheses and over a million retrieved paragraphs (10 hypotheses had only exact matches). The number of paragraphs retrieved per hypothesis range from 1 to 400,000.5</Paragraph> <Section position="1" start_page="46" end_page="46" type="sub_section"> <SectionTitle> 5.1 Evaluation </SectionTitle> <Paragraph position="0"> The model's entailment probability, tep, was compared to the following two baseline models. The first, denoted as base, is the naive baseline in which all retrieved texts are presumed to entail the hypothesis with equal confidence. This baseline corresponds to systems which perform blind expansion with no weighting. The second baseline, entscore, is the entailment score (6) from 3.2.</Paragraph> <Paragraph position="1"> The top 20 best results for all methods were given to judges to be annotated for entailment.</Paragraph> <Paragraph position="2"> Judges were asked to annotate an example as true if given the text they can infer with high confidence that the hypothesis is true (similar to the guidelines published for the RTE Challenge dataset). Accordingly, they were instructed to annotate the example as false if either they believe the hypothesis is false given the text or if the text is unrelated to the hypothesis. In total there were 1683 text-hypothesis pairs, which were randomly divided between two judges. In order to measure agreement, we had 200 of the pairs annotated by both judges, yielding a moderate agreement (a Kappa of 0.6).</Paragraph> </Section> </Section> <Section position="7" start_page="46" end_page="47" type="metho"> <SectionTitle> 4 The following WordNet relations were used: Synonyms, see </SectionTitle> <Paragraph position="0"> also, similar to, hypernyms/hyponyms, meronyms/holonyms, pertainyms, attribute, entailment, cause and domain</Paragraph> </Section> class="xml-element"></Paper>