File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-0204_metho.xml

Size: 13,755 bytes

Last Modified: 2025-10-06 14:10:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0204">
  <Title>Improving Semi-Supervised Acquisition of Relation Extraction Patterns</Title>
  <Section position="5" start_page="30" end_page="30" type="metho">
    <SectionTitle>
3 Relation Extraction Patterns
</SectionTitle>
    <Paragraph position="0"> Both these approaches used extraction patterns which were based on dependency analysis (Tesni'ere, 1959) of text. Under this approach the structure of a sentence is represented by a set of directed binary links between a word (the head) and one of its modifiers. These links may be labelled to indicate the grammatical relation between the head and modifier (e.g. subject, object). Cyclical paths are generally disallowed and the analysis forms a tree structure. An example dependency analysis for the sentence &amp;quot;Acme Inc. hired Mr Smith as their new CEO, replacing Mr Bloggs.&amp;quot; is shown in Figure 1.</Paragraph>
    <Paragraph position="1"> The extraction patterns used by both Yangarber et al. (2000) and Stevenson and Greenwood (2005) were based on SVO tuples extracted from dependency trees. The dependency tree shown in Figure 1 would generate two patterns: replace obj[?][?]- Mr Bloggs and Acme Inc. subj-[?][?]hire obj[?][?]-Mr Smith.</Paragraph>
    <Paragraph position="2"> While these represent some of the core information in this sentence, they cannot be used to identify a number of relations including the connection between Mr. Smith and CEO or between Mr. Smith and Mr. Bloggs.</Paragraph>
    <Paragraph position="3"> A number of alternative approaches to constructing extraction patterns from dependency trees have been proposed (e.g. (Sudo et al., 2003; Bunescu and Mooney, 2005)). Previous analysis (Stevenson and Greenwood, 2006a) suggests that the most useful of these is one based on pairs of linked chains from the dependency tree. A chain can be defined as a path between a verb node and any other node in the dependency tree passing through zero or more intermediate nodes (Sudo et al., 2001). The linked chains model (Greenwood et al., 2005) represents extraction patterns as a pair of chains which share the same verb but no direct descendants. It can be shown that linked  tions within a dependency analysis (Stevenson and Greenwood, 2006a). For example, the dependency tree shown in Figure 1 contains four named entities (Acme Inc., Mr Smith, CEO and Mr. Bloggs) and linked chains patterns can be used to represent the relation between any pair.1 Some example patterns extracted from the analysis in Figure 1 can be seen in Figure 2. An additional advantage of linked chain patterns is that they do not cause an unwieldy number of candidate patterns to be generated unlike some other approaches for representing extraction patterns, such as the one proposed by Sudo et al. (2003) where any subtree of the dependency tree can act as a potential pattern.</Paragraph>
    <Paragraph position="4"> When used within IE systems these patterns are generalised by replacing terms which refer to specific entities with a general semantic class. For example, the pattern</Paragraph>
  </Section>
  <Section position="6" start_page="30" end_page="32" type="metho">
    <SectionTitle>
4 Pattern Similarity
</SectionTitle>
    <Paragraph position="0"> Patterns such as linked chains have not been used by semi-supervised approaches to pattern learning. These algorithms require a method of determining the similarity of patterns. Simple patterns, such as SVO tuples, have a fixed structure containing few items and tend to occur relatively frequently in corpora. However, more complex patterns, such as linked chains, have a less fixed structure and occur less frequently. Consequently, the previously proposed approaches for determining pattern similarity (see Section 2) are unlikely to be as successful with these more complex patterns. The approach proposed by Stevenson and 1Note that we allow a linked chain pattern to represent the relation between two items when they are on the same chain, such as Mr Smith and CEO in this example.</Paragraph>
    <Paragraph position="1">  Greenwood (2005) relies on representing patterns as vectors which is appropriate for SVO tuples but not when patterns may include significant portions of the dependency tree. Yangarber et al.</Paragraph>
    <Paragraph position="2"> (2000) suggested a method where patterns were compared based on their distribution across documents in a corpus. However, since more complex patterns are more specific they occur with fewer corpus instances which is likely to hamper this type of approach.</Paragraph>
    <Paragraph position="3"> Another approach to relation extraction is to use supervised learning algorithms, although they require more training data than semi-supervised approaches. In particular various approaches (Zelenko et al., 2003; Culotta and Sorensen, 2004; Bunescu and Mooney, 2005) have used kernel methods to determine the sentences in a corpus which contain instances of a particular relation.</Paragraph>
    <Paragraph position="4"> Kernel methods (Vapnik, 1998) allow the representation of large and complicated feature spaces and are therefore suitable when the instances are complex extraction rules, such as linked chains.</Paragraph>
    <Paragraph position="5"> Several previous kernels used for relation extraction have been based on trees and include methods based on shallow parse trees (Zelenko et al., 2003), dependency trees (Culotta and Sorensen, 2004) and part of a dependency tree which represents the shortest path between the items being related (Bunescu and Mooney, 2005). Kernels methods rely on a similarity function between pairs of instances (the kernel) and these can be used within semi-supervised approaches to pattern learning such as those outlined in Section 2.</Paragraph>
    <Section position="1" start_page="31" end_page="32" type="sub_section">
      <SectionTitle>
4.1 Structural Similarity Measure
</SectionTitle>
      <Paragraph position="0"> The remainder of this Section describes a similarity function for pairs of linked chains, based on the tree kernel proposed by Culotta and Sorensen (2004). The measure compares patterns by following their structure from the root nodes through the patterns until they diverge too far to be considered similar.</Paragraph>
      <Paragraph position="1"> Each node in an extraction pattern has three features associated with it: the word, the relation to a parent, and the part-of-speech (POS) tag. The values of these features for node n are denoted by nword, nreln and npos respectively. Pairs of nodes can be compared by examining the values of these features and also by determining the semantic similarity of the words. A set of four functions, F ={word,relation,pos,semantic}, is used to compare nodes. The first three of these correspond to the node features with the same name; the relevant function returns 1 if the value of the feature is equal for the two nodes and 0 otherwise. For example, the pos function compares the values of the part of speech feature for nodes n1 and n2:</Paragraph>
      <Paragraph position="3"> The remaining function, semantic, returns a value between 0 and 1 to signify the semantic similarity of lexical items contained in the word feature of each node. This similarity is computed using the WordNet (Fellbaum, 1998) similarity function introduced by Lin (1998) .</Paragraph>
      <Paragraph position="4"> The similarity of two nodes is zero if their part of speech tags are different and, otherwise, is simply the sum of the scores provided by the four functions which form the set F. This is represented by the function s:</Paragraph>
      <Paragraph position="6"> where r1 and r2 are the root nodes of patterns l1 and l2 (respectively) and Cr is the set of children of node r.</Paragraph>
      <Paragraph position="7"> The final part of the similarity function calculates the similarity between the child nodes of n1</Paragraph>
      <Paragraph position="9"> Using this similarity function a pair of identical nodes have a similarity score of four. Consequently, the similarity score for a pair of linked chain patterns can be normalised by dividing the similarity score by 4 times the size (in nodes) of the larger pattern. This results in a similarity function that is not biased towards either small or large patterns but will select the most similar pattern to those already accepted as representative of the domain. null This similarity function resembles the one introduced by Culotta and Sorensen (2004) but also 2In linked chain patterns the only nodes with multiple children are the root nodes so, in all but the first application, this formula can be simplified to simc (Cn1,Cn2) = sim(c1,c2).</Paragraph>
      <Paragraph position="10">  differs in a number of ways. Both functions make use of WordNet to compare tree nodes. Culotta and Sorensen (2004) consider whether one node is the hypernym of the other while the approach introduced here makes use of existing techniques to measure semantic similarity. The similarity function introduced by Culotta and Sorensen (2004) compares subsequences of child nodes which is not required for our measure since it is concerned only with linked chain extraction patterns.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="32" end_page="32" type="metho">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> This structural similarity metric was implemented within the general framework for semi-supervised pattern learning presented in Section 2. At each iteration the candidate patterns are compared against the set of currently accepted patterns and ranked according to the average similarity with the set of similar accepted patterns. The four highest scoring patterns are considered for acceptance but a pattern is only accepted if its score is within 0.95 of the similarity of the highest scoring pattern.</Paragraph>
    <Paragraph position="1"> We conducted experiments which compared the proposed pattern similarity metric with the vector space approach used by Stevenson and Greenwood (2005) (see Section 2). That approach was originally developed for simple extraction patterns consisting of subject-verb-object tuples but was extended for extraction patterns in the linked chain format by Greenwood et al. (2005). We use the measure developed by Lin (1998) to provide information about lexical similarity. This is the same measure which is used within the structural similarity metric (Section 4).</Paragraph>
    <Paragraph position="2"> Three different configurations of the iterative learning algorithm were compared. (1) Cosine (SVO) This approach uses the SVO model for extraction patterns and the cosine similarity metric to compare them (see Section 2). This version of the algorithm acts as a baseline which represents previously reported approaches (Stevenson and Greenwood, 2005; Stevenson and Greenwood, 2006b). (2) Cosine (Linked chain) uses extraction patterns based on the linked chain model along with the cosine similarity to compare them and is intended to determine the benefit which is gained from using the more expressive patterns.</Paragraph>
    <Paragraph position="3">  (3) Structural (Linked chain) also uses linked chain extraction patterns but compares them using  the similarity measure introduced in Section 4.1.</Paragraph>
  </Section>
  <Section position="8" start_page="32" end_page="33" type="metho">
    <SectionTitle>
COMPANY subj-[?][?]appointobj[?][?]-PERSON
COMPANY subj-[?][?]electobj[?][?]-PERSON
COMPANY subj-[?][?]promoteobj[?][?]-PERSON
COMPANY subj-[?][?]nameobj[?][?]-PERSON
PERSON subj-[?][?]resign
PERSON subj-[?][?]depart
PERSON subj-[?][?]quit
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="32" end_page="33" type="sub_section">
      <SectionTitle>
5.1 IE Scenario
</SectionTitle>
      <Paragraph position="0"> Experiments were carried out on the management succession extraction task used for the Sixth Message Understanding Conference (MUC-6) (MUC, 1995). This IE scenario concerns the movement of executives between positions and companies. We used a version of the evaluation data which was produced by Soderland (1999) in which each event was converted into a set of binary asymmetric relations. The corpus contains four types of relation: Person-Person, Person-Post, Person-Organisation, and Post-Organisation. At each iteration of the algorithm the related items identified by the current set of learned patterns are extracted from the text and compared against the set of related items which are known to be correct. The systems are evaluated using the widely used precision (P) and recall (R) metrics which are combined using the F-measure (F).</Paragraph>
      <Paragraph position="1"> The texts used for these experiments have been previously annotated with named entities. MINI-PAR (Lin, 1999), after being adapted to handle the named entity tags, was used to produce the dependency analysis from which the pattersn were generated. All experiments used the seed patterns in Table 1 which are indicative of this extraction task and have been used in previous experiments into semi-supervised IE pattern acquisition (Stevenson and Greenwood, 2005; Yangarber et al., 2000).</Paragraph>
      <Paragraph position="2"> The majority of previous semi-supervised approaches to IE have been evaluated over preliminary tasks such as the identification of event participants (Sudo et al., 2003) or sentence filtering (Stevenson and Greenwood, 2005). These may be a useful preliminary tasks but it is not clear to what extent the success of such systems will be repeated when used to perform relation extraction. Conse- null quently we chose a relation extraction task to evaluate the work presented here.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML