File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-3206_metho.xml

Size: 16,984 bytes

Last Modified: 2025-10-06 14:09:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3206">
  <Title>Scaling Web-based Acquisition of Entailment Relations</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The TE/ASE Acquisition Method
</SectionTitle>
    <Paragraph position="0"> Motivated by prior experience, we identify two major goals for scaling Web-based acquisition of entailment relations: (a) Covering the broadest possible range of meanings, while requiring minimal input and (b) Keeping template structures as general as possible. To address the first goal we require as input only a phrasal lexicon of the relevant domain (including single words and multi-word expressions). Broad coverage lexicons are widely available or may be constructed using known term acquisition techniques, making it a feasible and scalable input requirement. We then aim to acquire entailment relations that include any of the lexicon's entries. The second goal is addressed by a novel algorithm for extracting the most general templates being justified by the data.</Paragraph>
    <Paragraph position="1"> For each lexicon entry, denoted a pivot, our extraction method performs two phases: (a) extract promising anchor sets for that pivot (ASE, Section 3.1), and (b) from sentences containing the anchor sets, extract templates for which an entailment relation holds with the pivot (TE, Section 3.2). Examples for verb pivots are: 'acquire', 'fall to', 'prevent'. We will use the pivot 'prevent' for examples through this section.</Paragraph>
    <Paragraph position="2"> Before presenting the acquisition method we first define its output. A template is a dependency parse-tree fragment, with variable slots at some tree nodes (e.g. 'X subj- prevent obj-Y'). An entailment relation between two templates T1 and T2 holds if the meaning of T2 can be inferred from the meaning of T1 (or vice versa) in some contexts, but not necessarily all, under the same variable instantiation. For example, 'X subj- prevent obj-Y' entails 'X subj- reduce obj-Y risk' because the sentence &amp;quot;aspirin reduces heart attack risk&amp;quot; can be inferred from &amp;quot;aspirin prevents a first heart attack&amp;quot;. Our output consists of pairs of templates for which an entailment relation holds.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Anchor Set Extraction (ASE)
</SectionTitle>
      <Paragraph position="0"> The goal of this phase is to find a substantial number of promising anchor sets for each pivot. A good anchor-set should satisfy a proper balance between specificity and generality. On one hand, an anchor set should correspond to a sufficiently specific setting, so that entailment would hold between its different occurrences. On the other hand, it should be sufficiently frequent to appear with different entailing templates.</Paragraph>
      <Paragraph position="1"> Finding good anchor sets based on just the input pivot is a hard task. Most methods identify good repeated anchors &amp;quot;in retrospect&amp;quot;, that is after processing a full corpus, while previous Web-based methods require at least one good anchor set as input.</Paragraph>
      <Paragraph position="2"> Given our minimal input, we needed refined criteria that identify a priori the relatively few promising anchor sets within a sample of pivot occurrences.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ASE ALGORITHM STEPS:
</SectionTitle>
    <Paragraph position="0"> For each pivot (a lexicon entry)  1. Create a pivot template, Tp 2. Construct a parsed sample corpus S for Tp: (a) Retrieve an initial sample from the Web (b) Identify associated phrases for the pivot (c) Extend S using the associated phrases 3. Extract candidate anchor sets from S: (a) Extract slot anchors (b) Extract context anchors 4. Filter the candidate anchor sets: (a) by absolute frequency (b) by conditional pivot probability  The ASE algorithm (presented in Figure 1) performs 4 main steps.</Paragraph>
    <Paragraph position="1"> STEP (1) creates a complete template, called the pivot template and denoted Tp, for the input pivot, denoted P. Variable slots are added for the major types of syntactic relations that interact with P, based on its syntactic type. These slots enable us to later match Tp with other templates. For verbs, we add slots for a subject and for an object or a modifier (e.g. 'X subj- prevent obj-Y').</Paragraph>
    <Paragraph position="2"> STEP (2) constructs a sample corpus, denoted S, for the pivot template. STEP (2.A) utilizes a Web search engine to initialize S by retrieving sentences containing P. The sentences are parsed by the MINIPAR dependency parser (Lin, 1998), keeping only sentences that contain the complete syntactic template Tp (with all the variables instantiated). STEP (2.B) identifies phrases that are statistically associated with Tp in S. We test all noun-phrases in S , discarding phrases that are too common on the Web (absolute frequency higher than a threshold MAXPHRASEF), such as &amp;quot;desire&amp;quot;. Then we select the N phrases with highest tf*idf score1. These phrases have a strong collocation relationship with the pivot P and are likely to indicate topical (rather than anecdotal) occurrences of P. For example, the phrases &amp;quot;patient&amp;quot; and &amp;quot;American Dental Association&amp;quot;, which indicate contexts of preventing health problems, were selected for the pivot 'prevent'. Fi-</Paragraph>
    <Paragraph position="4"> where freqS(X) is the number of occurrences in S containing X, N is the total number of Web documents, and freqW(X) is the number of Web documents containing X.</Paragraph>
    <Paragraph position="5"> nally, STEP (2.C) expands S by querying the Web with the both P and each of the associated phrases, adding the retrieved sentences to S as in step (2.a).</Paragraph>
    <Paragraph position="6"> STEP (3) extracts candidate anchor sets for Tp.</Paragraph>
    <Paragraph position="7"> From each sentence in S we try to generate one candidate set, containing noun phrases whose Web frequency is lower than MAXPHRASEF. STEP (3.A) extracts slot anchors - phrases that instantiate the slot variables of Tp. Each anchor is marked with the corresponding slot. For example, the anchors {antibioticssubj-, miscarriage obj-} were extracted from the sentence &amp;quot;antibiotics in pregnancy prevent miscarriage&amp;quot;.</Paragraph>
    <Paragraph position="8"> STEP (3.B) tries to extend each candidate set with one additional context anchor, in order to improve its specificity. This anchor is chosen as the highest tf*idf scoring phrase in the sentence, if it exists. In the previous example, 'pregnancy' is selected.</Paragraph>
    <Paragraph position="9"> STEP (4) filters out bad candidate anchor sets by two different criteria. STEP (4.A) maintains only candidates with absolute Web frequency within a threshold range [MINSETF, MAXSETF], to guarantee an appropriate specificity-generality level. STEP (4.B) guarantees sufficient (directional) association between the candidate anchor set c and Tp, by esti-</Paragraph>
    <Paragraph position="11"> where freqW is Web frequency and P is the pivot.</Paragraph>
    <Paragraph position="12"> We maintain only candidates for which this probability falls within a threshold range [SETMINP, SETMAXP]. Higher probability often corresponds to a strong linguistic collocation between the candidate and Tp, without any semantic entailment. Lower probability indicates coincidental cooccurrence, without a consistent semantic relation.</Paragraph>
    <Paragraph position="13"> The remaining candidates in S become the input anchor-sets for the template extraction phase, for example,{Aspirinsubj-, heart attackobj-}for 'prevent'. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Template Extraction (TE)
</SectionTitle>
      <Paragraph position="0"> The Template Extraction algorithm accepts as its input a list of anchor sets extracted from ASE for each pivot template. Then, TE generates a set of syntactic templates which are supposed to maintain an entailment relationship with the initial pivot template. TE  performs three main steps, described in the following subsections: 1. Acquisition of a sample corpus from the Web.</Paragraph>
      <Paragraph position="1"> 2. Extraction of maximal most general templates from that corpus.</Paragraph>
      <Paragraph position="2"> 3. Post-processing and final ranking of extracted templates.</Paragraph>
      <Paragraph position="3"> 3.2.1 Acquisition of a sample corpus from the</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Web
</SectionTitle>
      <Paragraph position="0"> For each input anchor set, TE acquires from the Web a sample corpus of sentences containing it.</Paragraph>
      <Paragraph position="1"> For example, a sentence from the sample corpus for {aspirin, heart attack} is: &amp;quot;Aspirin stops heart attack?&amp;quot;. All of the sample sentences are then parsed with MINIPAR (Lin, 1998), which generates from each sentence a syntactic directed acyclic graph (DAG) representing the dependency structure of the sentence. Each vertex in this graph is labeled with a word and some morphological information; each graph edge is labeled with the syntactic relation between the words it connects.</Paragraph>
      <Paragraph position="2"> TE then substitutes each slot anchor (see section 3.1) in the parse graphs with its corresponding slot variable. Therefore, &amp;quot;Aspirin stops heart attack?&amp;quot; will be transformed into 'X stop Y'. This way all the anchors for a certain slot are unified under the same variable name in all sentences. The parsed sentences related to all of the anchor sets are subsequently merged into a single set of parse graphs S ={P1,P2,...,Pn}(see P1 and P2 in Figure 2).</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2.2 Extraction of maximal most general
templates
</SectionTitle>
      <Paragraph position="0"> The core of TE is a General Structure Learning algorithm (GSL) that is applied to the set of parse graphs S resulting from the previous step. GSL extracts single-rooted syntactic DAGs, which are named spanning templates since they must span at least over Na slot variables, and should also appear in at least Nr sentences from S (In our experiments we set Na=2 and Nr=2). GSL learns maximal most general templates: they are spanning templates which, at the same time, (a) cannot be generalized by further reduction and (b) cannot be further extended keeping the same generality level.</Paragraph>
      <Paragraph position="1"> In order to properly define the notion of maximal most general templates, we introduce some formal definitions and notations.</Paragraph>
      <Paragraph position="2"> DEFINITION: For a spanning template t we define a sentence set, denoted with s(t), as the set of all parsed sentences in S containing t.</Paragraph>
      <Paragraph position="3"> For each pair of templates t1 and t2, we use the notation t1 precedesequalt2 to denote that t1 is included as a sub-graph or is equal to t2. We use the notation t1 [?]t2 when such inclusion holds strictly. We define T(S) as the set of all spanning templates in the sample S.</Paragraph>
      <Paragraph position="4"> DEFINITION: A spanning template t [?] T(S) is maximal most general if and only if both of the following conditions hold:</Paragraph>
      <Paragraph position="6"> s(t)[?]s(tprime).</Paragraph>
      <Paragraph position="7"> Condition A ensures that the extracted templates do not contain spanning sub-structures that are more &amp;quot;general&amp;quot; (i.e. having a larger sentence set); condition B ensures that the template cannot be further enlarged without reducing its sentence set.</Paragraph>
      <Paragraph position="8"> GSL performs template extraction in two main steps: (1) build a compact graph representation of all the parse graphs from S; (2) extract templates from the compact representation.</Paragraph>
      <Paragraph position="9"> A compact graph representation is an aggregate graph which joins all the sentence graphs from S ensuring that all identical spanning sub-structures from different sentences are merged into a single one. Therefore, each vertex v (respectively, edge e) in the aggregate graph is either a copy of a corresponding vertex (edge) from a sentence graph Pi or it represents the merging of several identically labeled vertices (edges) from different sentences in S. The set of such sentences is defined as the sentence set of v (e), and is represented through the set of index numbers of related sentences (e.g. &amp;quot;(1,2)&amp;quot; in the third tree of Figure 2). We will denote with Gi the compact graph representation of the first i sentences in S. The parse trees P1 and P2 of two sentences and their related compact representation G2 are shown in Figure 2.</Paragraph>
      <Paragraph position="10"> Building the compact graph representation The compact graph representation is built incrementally. The algorithm starts with an empty aggregate graph G0 and then merges the sentence graphs from S one at a time into the aggregate structure.</Paragraph>
      <Paragraph position="11"> Let's denote the current aggregate graph with Gi[?]1(Vg,Eg) and let Pi(Vp,Ep) be the parse graph which will be merged next. Note that the sentence set of Pi is a single element set{i}.</Paragraph>
      <Paragraph position="12"> During each iteration a new graph is created as the union of both input graphs: Gi = Gi[?]1[?]Pi.</Paragraph>
      <Paragraph position="13"> Then, the following merging procedure is performed on the elements of Gi</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. ADDING GENERALIZED VERTICES TO Gi.
</SectionTitle>
    <Paragraph position="0"> For every two vertices vg [?] Vg,vp [?] Vp having equal labels, a new generalized vertex vnewg is created and added to Gi. The new vertex takes the same label and holds a sentence set which is formed from the sentence set of vg by adding i to it. Still with reference to Figure 2, the generalized vertices in G2 are 'X', 'Y' and 'stop'. The algorithm connects the generalized vertex vnewg with all the vertices which are connected with vg and vp.</Paragraph>
    <Paragraph position="1">  2. MERGING EDGES. If two edges eg [?]Eg and ep [?] Ep have equal labels and their corresponding adjacent vertices have been merged, then ea and ep are also merged into a new edge. In Figure 2 the edges ('stop', 'X') and ('stop', 'Y') from P1 and P2 are eventually merged into G2.</Paragraph>
    <Paragraph position="2"> 3. DELETING MERGED VERTICES. Every vertex v from Vp or Vg for which at least one generalized  vertex vnewg exists is deleted from Gi.</Paragraph>
    <Paragraph position="3"> As an optimization step, we merge only vertices and edges that are included in equal spanning templates. null Extracting the templates GSL extracts all maximal most general templates from the final compact representation Gn using the following sub-algorithm:</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1. BUILDING MINIMAL SPANNING TREES. For
</SectionTitle>
    <Paragraph position="0"> every Na different slot variables in Gn having a common ancestor, a minimal spanning tree st is built. Its sentence set is computed as the intersection of the sentence sets of its edges and vertices.</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. EXPANDING THE SPANNING TREES. Every
</SectionTitle>
    <Paragraph position="0"> minimal spanning tree st is expanded to the maximal sub-graph maxst whose sentence set is equal to s(st). All maximal single-rooted DAGs in maxst are extracted as candidate templates. Maximality ensures that the extracted templates cannot be expanded further while keeping the same sentence set, satisfying condition B.</Paragraph>
    <Paragraph position="1"> 3. FILTERING. Candidates which contain another candidate with a larger sentence set are filtered out. This step guarantees condition A.</Paragraph>
    <Paragraph position="2"> In Figure 2 the maximal most general template in G2 is 'X subj- stop obj-Y'.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2.3 Post-processing and ranking of extracted
templates
</SectionTitle>
      <Paragraph position="0"> As a last step, names and numbers are filtered out from the templates. Moreover, TE removes those templates which are very long or which appear with just one anchor set and in less than four sentences.</Paragraph>
      <Paragraph position="1"> Finally, the templates are sorted first by the number of anchor sets with which each template appeared, and then by the number of sentences in which they appeared.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML