File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1210_metho.xml

Size: 15,475 bytes

Last Modified: 2025-10-06 14:10:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1210">
  <Title>Definition and Analysis of Intermediate Entailment Levels</Title>
  <Section position="3" start_page="55" end_page="57" type="metho">
    <SectionTitle>
2 Definition of Entailment Levels
</SectionTitle>
    <Paragraph position="0"> In this section we present definitions for two entailment models that correspond to the Lexical and Lexical-Syntactic levels. For each level we describe the available inference mechanisms. Table 1 presents several examples from the RTE test-set together with annotation of entailment at the different levels.</Paragraph>
    <Section position="1" start_page="55" end_page="56" type="sub_section">
      <SectionTitle>
2.1 The Lexical entailment level
</SectionTitle>
      <Paragraph position="0"> At the lexical level we assume that the text T and hypothesis H are represented by a bag of (possibly multi-word) terms, ignoring function words. At this level we define that entailment holds between T and H if every term h in H can be matched by a corresponding entailing term t in T. t is considered as entailing h if either h and t share the same lemma and part of speech, or t can be matched with h through a sequence of lexical transformations of the types described below.</Paragraph>
      <Paragraph position="1"> Morphological derivations This inference mechanism considers two terms as equivalent if one can be obtained from the other by some morphological derivation. Examples include nominalizations (e.g. 'acquisition = acquire'), pertainyms (e.g.</Paragraph>
      <Paragraph position="2"> 'Afghanistan = Afghan'), or nominal derivations like 'terrorist = terror'.</Paragraph>
      <Paragraph position="3"> Ontological relations This inference mechanism refers to ontological relations between terms. A term is inferred from another term if a chain of valid ontological relations between the two terms exists (Andreevskaia et al., 2005). In our experiment we regarded the following three ontological relations as providing entailment inferences: (1) 'synonyms' (e.g. 'free = release' in example 1361, Table 1); (2) 'hypernym' (e.g. 'produce = make') and (3) 'meronym-holonym' (e.g. 'executive = company').</Paragraph>
      <Paragraph position="4">  No. Text Hypothesis Task Ent. Lex.</Paragraph>
      <Paragraph position="5"> Ent.</Paragraph>
      <Paragraph position="6"> Syn.</Paragraph>
      <Paragraph position="7"> Ent.</Paragraph>
      <Paragraph position="8"> 322 Turnout for the historic vote for the first time since the EU took in 10 new members in May has hit a record low of 45.3%.</Paragraph>
      <Paragraph position="9"> New members joined the EU.</Paragraph>
      <Paragraph position="10"> IR true false true 1361 A Filipino hostage in Iraq was released. A Filipino hostage was freed in Iraq.</Paragraph>
      <Paragraph position="11"> CD true true true 1584 Although a Roscommon man by birth, born in Rooskey in 1932, Albert &amp;quot;The Slasher&amp;quot; Reynolds will forever be a Longford man by association.</Paragraph>
      <Paragraph position="12"> Albert Reynolds was born in Co. Roscommon.</Paragraph>
      <Paragraph position="13"> QA true true true 1911 The SPD got just 21.5% of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5%.</Paragraph>
      <Paragraph position="14"> The SPD is defeated by the opposition parties.</Paragraph>
      <Paragraph position="15">  holds between the text and hypothesis (Ent.), whether Lexical entailment holds (Lex. Ent.) and whether Lexical-Syntactic entailment holds (Syn. Ent.). Lexical World knowledge This inference mechanism refers to world knowledge reflected at the lexical level, by which the meaning of one term can be inferred from the other. It includes both knowledge about named entities, such as 'Taliban = organization' and 'Roscommon = Co. Roscommon' (example 1584 in Table 1), and other lexical relations between words, such as WordNet's relations 'cause' (e.g. 'kill = die') and 'entail' (e.g. 'snore = sleep').</Paragraph>
    </Section>
    <Section position="2" start_page="56" end_page="57" type="sub_section">
      <SectionTitle>
2.2 The Lexical-syntactic entailment level
</SectionTitle>
      <Paragraph position="0"> At the lexical-syntactic level we assume that the text and the hypothesis are represented by the set of syntactic dependency relations of their dependency parse. At this level we ignore determiners and auxiliary verbs, but do include relations involving other function words. We define that entailment holds between T and H if the relations within H can be &amp;quot;covered&amp;quot; by the relations in T. In the trivial case, lexical-syntactic entailment holds if all the relations composing H appear verbatim in T (while additional relations within T are allowed). Otherwise, such coverage can be obtained by a sequence of transformations applied to the relations in T, which should yield all the relations in H.</Paragraph>
      <Paragraph position="1"> One type of such transformations are the lexical transformations, which replace corresponding lexical items, as described in sub-section 2.1. When applying morphological derivations it is assumed that the syntactic structure is appropriately adjusted. For example, &amp;quot;Mexico produces oil&amp;quot; can be mapped to &amp;quot;oil production by Mexico&amp;quot; (the NOMLEX resource (Macleod et al., 1998) provides a good example for systematic specification of such transformations).</Paragraph>
      <Paragraph position="2"> Additional types of transformations at this level are specified below.</Paragraph>
      <Paragraph position="3"> Syntactic transformations This inference mechanism refers to transformations between syntactic structures that involve the same lexical elements and preserve the meaning of the relationships between them (as analyzed in (Vanderwende et al., 2005)).</Paragraph>
      <Paragraph position="4"> Typical transformations include passive-active and apposition (e.g. 'An Wang, a native of Shanghai = An Wang is a native of Shanghai').</Paragraph>
      <Paragraph position="5">  Entailment paraphrases This inference mechanism refers to transformations that modify the syntactic structure of a text fragment as well as some of its lexical elements, while holding an entailment relationship between the original text and the transformed one. Such transformations are typically denoted as 'paraphrases' in the literature, where a wealth of methods for their automatic acquisition were proposed (Lin and Pantel, 2001; Shinyama et al., 2002; Barzilay and Lee, 2003; Szpektor et al., 2004). Following the same spirit, we focus here on transformations that are local in nature, which, according to the literature, may be amenable for large scale acquisition. Examples include: 'X is Y man by birth - X was born in Y' (example 1584 in Table 1), 'X take in Y = Y join X'1 and 'X is holy book of Y = Y follow X'2.</Paragraph>
      <Paragraph position="6"> Co-reference Co-references provide equivalence relations between different terms in the text and thus induce transformations that replace one term in a text with any of its co-referenced terms. For example, the sentence &amp;quot;Italy and Germany have each played twice, and they haven't beaten anybody yet.&amp;quot;3 entails &amp;quot;Neither Italy nor Germany have won yet&amp;quot;, involving the co-reference transformation 'they = Italy and Germany'.</Paragraph>
      <Paragraph position="7"> Example 1584 in Table 1 demonstrates the need to combine different inference mechanisms to achieve lexical-syntactic entailment, requiring world-knowledge, paraphrases and syntactic transformations. null</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="57" end_page="59" type="metho">
    <SectionTitle>
3 Empirical Analysis
</SectionTitle>
    <Paragraph position="0"> In this section we present the experiment that we conducted in order to analyze the two entailment levels, which are presented in section 2, in terms of relative performance and correlation with the notion of textual entailment.</Paragraph>
    <Section position="1" start_page="57" end_page="57" type="sub_section">
      <SectionTitle>
3.1 Data and annotation procedure
</SectionTitle>
      <Paragraph position="0"> The RTE test-set4 contains 800 Text-Hypothesis pairs (usually single sentences), which are typical  to various NLP applications. Each pair is annotated with a boolean value, indicating whether the hypothesis is entailed by the text or not, and the test-set is balanced in terms of positive and negative cases. We shall henceforth refer to this annotation as the gold standard. We constructed a sample of 240 pairs from four different tasks in the test-set, which correspond to the main applications that may benefit from entailment: information extraction (IE), information retrieval (IR), question answering (QA), and comparable documents (CD). We randomly picked 60 pairs from each task, and in total 118 of the cases were positive and 122 were negative.</Paragraph>
      <Paragraph position="1"> In our experiment, two of the authors annotated, for each of the two levels, whether or not entailment can be established in each of the 240 pairs. The annotators agreed on 89.6% of the cases at the lexical level, and 88.8% of the cases at the lexical-syntactic level, with Kappa statistics of 0.78 and 0.73, respectively, corresponding to 'substantial agreement' (Landis and Koch, 1977). This relatively high level of agreement suggests that the notion of lexical and lexical-syntactic entailment we propose are indeed well-defined.</Paragraph>
      <Paragraph position="2"> Finally, in order to establish statistics from the annotations, the annotators discussed all the examples they disagreed on and produced a final joint decision. null</Paragraph>
    </Section>
    <Section position="2" start_page="57" end_page="58" type="sub_section">
      <SectionTitle>
3.2 Evaluating the different levels of entailment
</SectionTitle>
      <Paragraph position="0"> annotated dataset for both lexical (L) and lexical-syntactic (LS) levels. Taking a &amp;quot;system&amp;quot;-oriented perspective, the annotations at each level can be viewed as the classifications made by an idealized system that includes a perfect implementation of the inference mechanisms in that level. The first two  rows show for each level how the cases, which were recognized as positive by this level (i.e. the entailment holds), are distributed between &amp;quot;true positive&amp;quot; (i.e. positive according to the gold standard) and &amp;quot;false positive&amp;quot; (negative according to the gold standard). The total number of positive and negative pairs in the dataset is reported in parentheses. The rest of the table details recall, precision, F1 and accuracy. null The distribution of the examples in the RTE test-set cannot be considered representative of a real-world distribution (especially because of the controlled balance between positive and negative examples). Thus, our statistics are not appropriate for accurate prediction of application performance. Instead, we analyze how well these simplified models of entailment succeed in approximating &amp;quot;real&amp;quot; entailment, and how they compare with each other.</Paragraph>
      <Paragraph position="1"> The proportion between true and false positive cases at the lexical level indicates that the correlation between lexical match and entailment is quite low, reflected in the low precision achieved by this level (only 59%). This result can be partly attributed to the idiosyncracies of the RTE test-set: as reported in (Dagan et al., 2005), samples with high lexical match were found to be biased towards the negative side. Interestingly, our measured accuracy correlates well with the performance of systems at the PASCAL RTE Workshop, where the highest reported accuracy of a lexical system is 0.586 (Dagan et al., 2005).</Paragraph>
      <Paragraph position="2"> As one can expect, adding syntax considerably reduces the number of false positives - from 36 to only 10. Surprisingly, at the same time the number of true positive cases grows from 52 to 59, and correspondingly, precision rise to 86%. Interestingly, neither the lexical nor the lexical-syntactic level are able to cover more than half of the positive cases (e.g. example 1911 in Table 1).</Paragraph>
      <Paragraph position="3"> In order to better understand the differences between the two levels, we next analyze the overlap between them, presented in Table 3. Looking at Table 3(a), which contains only the positive cases, we see that many examples were recognized only by one of the levels. This interesting phenomenon can be explained on the one hand by lexical matches that could not be validated in the syntactic level, and on the other hand by the use of paraphrases, which are  the RTE dataset sample, and (b) includes only the negative examples.</Paragraph>
      <Paragraph position="4"> introduced only in the lexical-syntactic level. (e.g.</Paragraph>
      <Paragraph position="5"> example 322 in Table 1).</Paragraph>
      <Paragraph position="6"> This relatively symmetric situation changes as we move to the negative cases, as shown in Table 3(b).</Paragraph>
      <Paragraph position="7"> By adding syntactic constraints, the lexical-syntactic level was able to fix 29 false positive errors, misclassified at the lexical level (as demonstrated in example 2127, Table 1), while introducing only 3 new false-positive errors. This exemplifies the importance of syntactic matching for precision.</Paragraph>
    </Section>
    <Section position="3" start_page="58" end_page="59" type="sub_section">
      <SectionTitle>
3.3 The contribution of various inference
</SectionTitle>
      <Paragraph position="0"> (triangleR) and percentage (%), within the gold standard positive examples, of the various inference mechanisms at each level, ordered by their significance.</Paragraph>
      <Paragraph position="1">  In order to get a sense of the contribution of the various components at each level, statistics on the inference mechanisms that contributed to the coverage of the hypothesis by the text (either full or partial) were recorded by one annotator. Only the positive cases in the gold standard were considered.</Paragraph>
      <Paragraph position="2"> For each inference mechanism we measured its frequency, its contribution to the recall of the related level and the percentage of cases in which it is required for establishing entailment. The latter also takes into account cases where only partial coverage could be achieved, and thus indicates the significance of each inference mechanism for any entailment system, regardless of the models presented in this paper. The results are summarized in Table 4.</Paragraph>
      <Paragraph position="3"> From Table 4 it stands that paraphrases are the most notable contributors to recall. This result indicates the importance of paraphrases to the entailment task and the need for large-scale paraphrase collections. Syntactic transformations are also shown to contribute considerably, indicating the need for collections of syntactic transformations as well. In that perspective, we propose our annotation framework as means for evaluating collections of paraphrases or syntactic transformations in terms of recall.</Paragraph>
      <Paragraph position="4"> Finally, we note that the co-reference moderate contribution can be partly attributed to the idiosyncracies of the RTE test-set: the annotators were guided to replace anaphors with the appropriate reference, as reported in (Dagan et al., 2005).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML