File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1053_metho.xml
Size: 11,847 bytes
Last Modified: 2025-10-06 14:09:48
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1053"> <Title>Exploring Various Knowledge in Relation Extraction</Title> <Section position="4" start_page="427" end_page="428" type="metho"> <SectionTitle> 3 Support Vector Machines </SectionTitle> <Paragraph position="0"> Support Vector Machines (SVMs) are a supervised machine learning technique motivated by the statistical learning theory (Vapnik 1998). Based on the structural risk minimization of the statistical learning theory, SVMs seek an optimal separating hyper-plane to divide the training examples into two classes and make decisions based on support vectors which are selected as the only effective instances in the training set.</Paragraph> <Paragraph position="1"> Basically, SVMs are binary classifiers.</Paragraph> <Paragraph position="2"> Therefore, we must extend SVMs to multi-class (e.g. K) such as the ACE RDC task. For efficiency, we apply the one vs. others strategy, which builds K classifiers so as to separate one class from all others, instead of the pairwise strategy, which builds K*(K-1)/2 classifiers considering all pairs of classes. The final decision of an instance in the multiple binary classification is determined by the class which has the maximal SVM output.</Paragraph> <Paragraph position="3"> Moreover, we only apply the simple linear kernel, although other kernels can peform better.</Paragraph> <Paragraph position="4"> The reason why we choose SVMs for this purpose is that SVMs represent the state-of-the-art in the machine learning research community, and there are good implementations of the algorithm available. In this paper, we use the binary-class SVMLight deleveloped by Joachims (1998).</Paragraph> <Paragraph position="5"> Joachims has just released a new version of SVMLight for multi-class classification. However, this paper only uses the binary-class version. For details about SVMLight, please see http://svmlight.joachims.org/</Paragraph> </Section> <Section position="5" start_page="428" end_page="430" type="metho"> <SectionTitle> 4 Features </SectionTitle> <Paragraph position="0"> The semantic relation is determined between two mentions. In addition, we distinguish the argument order of the two mentions (M1 for the first mention and M2 for the second mention), e.g. M1-Parent-Of-M2 vs. M2-Parent-Of-M1. For each pair of mentions3, we compute various lexical, syntactic and semantic features.</Paragraph> <Section position="1" start_page="428" end_page="428" type="sub_section"> <SectionTitle> 4.1 Words </SectionTitle> <Paragraph position="0"> According to their positions, four categories of words are considered: 1) the words of both the mentions, 2) the words between the two mentions, 3) the words before M1, and 4) the words after M2.</Paragraph> <Paragraph position="1"> For the words of both the mentions, we also differentiate the head word of a mention from other words since the head word is generally much more important. The words between the two mentions are classified into three bins: the first word in between, the last word in between and other words in between. Both the words before M1 and after M2 are classified into two bins: the first word next to the mention and the second word next to the mention. Since a pronominal mention (especially neutral pronoun such as 'it' and 'its') contains little information about the sense of the mention, the co-reference chain is used to decide its sense. This is done by replacing the pronominal mention with the most recent non-pronominal antecedent when determining the word features, which include: In ACE, each mention has a head annotation and an extent annotation. In all our experimentation, we only consider the word string between the beginning point of the extent annotation and the end point of the head annotation. This has an effect of choosing the base phrase contained in the extent annotation. In addition, this also can reduce noises without losing much of information in the mention. For example, in the case where the noun phrase &quot;the former CEO of McDonald&quot; has the head annotation of &quot;CEO&quot; and the extent annotation of &quot;the former CEO of McDonald&quot;, we only consider &quot;the former CEO&quot; in this paper.</Paragraph> <Paragraph position="2"> In this paper, the head word of a mention is normally set as the last word of the mention. However, when a preposition exists in the mention, its head word is set as the last word before the preposition. For example, the head word of the name mention &quot;University of Michigan&quot; is &quot;University&quot;.</Paragraph> <Paragraph position="3"> * WM2: bag-of-words in M2 * HM2: head word of M2 * HM12: combination of HM1 and HM2 * WBNULL: when no word in between * WBFL: the only word in between when only one word in between * WBF: first word in between when at least two words in between * WBL: last word in between when at least two words in between * WBO: other words in between except first and last words when at least three words in between * BM1F: first word before M1 * BM1L: second word before M1 * AM2F: first word after M2 * AM2L: second word after M2</Paragraph> </Section> <Section position="2" start_page="428" end_page="428" type="sub_section"> <SectionTitle> 4.2 Entity Type </SectionTitle> <Paragraph position="0"> This feature concerns about the entity type of both the mentions, which can be PERSON, ORGANIZATION, FACILITY, LOCATION and Geo-Political Entity or GPE: * ET12: combination of mention entity types</Paragraph> </Section> <Section position="3" start_page="428" end_page="428" type="sub_section"> <SectionTitle> 4.3 Mention Level </SectionTitle> <Paragraph position="0"> This feature considers the entity level of both the mentions, which can be NAME, NOMIAL and PRONOUN: * ML12: combination of mention levels</Paragraph> </Section> <Section position="4" start_page="428" end_page="428" type="sub_section"> <SectionTitle> 4.4 Overlap </SectionTitle> <Paragraph position="0"> This category of features includes:</Paragraph> <Paragraph position="2"> Normally, the above overlap features are too general to be effective alone. Therefore, they are also combined with other features: 1)</Paragraph> <Paragraph position="4"/> </Section> <Section position="5" start_page="428" end_page="428" type="sub_section"> <SectionTitle> 4.5 Base Phrase Chunking </SectionTitle> <Paragraph position="0"> It is well known that chunking plays a critical role in the Template Relation task of the 7 th</Paragraph> </Section> <Section position="6" start_page="428" end_page="429" type="sub_section"> <SectionTitle> Message </SectionTitle> <Paragraph position="0"> Understanding Conference (MUC-7 1998). The related work mentioned in Section 2 extended to explore the information embedded in the full parse trees. In this paper, we separate the features of base phrase chunking from those of full parsing. In this way, we can separately evaluate the contributions of base phrase chunking and full parsing. Here, the base phrase chunks are derived from full parse trees using the Perl script written by Sabine Buchholz from Tilburg University and the Collins' parser (Collins 1999) is employed for full parsing. Most of the chunking features concern about the head words of the phrases between the two mentions. Similar to word features, three categories of phrase heads are considered: 1) the phrase heads in between are also classified into three bins: the first phrase head in between, the last phrase head in between and other phrase heads in between; 2) the phrase heads before M1 are classified into two bins: the first phrase head before and the second phrase head before; 3) the phrase heads after M2 are classified into two bins: the first phrase head after and the second phrase head after. Moreover, we also consider the phrase path in between.</Paragraph> <Paragraph position="1"> * CPP: path of phrase labels connecting the two mentions in the chunking * CPPH: path of phrase labels connecting the two mentions in the chunking augmented with head words, if at most two phrases in between</Paragraph> </Section> <Section position="7" start_page="429" end_page="429" type="sub_section"> <SectionTitle> 4.6 Dependency Tree </SectionTitle> <Paragraph position="0"> This category of features includes information about the words, part-of-speeches and phrase labels of the words on which the mentions are dependent in the dependency tree derived from the syntactic full parse tree. The dependency tree is built by using the phrase head information returned by the Collins' parser and linking all the other http://ilk.kub.nl/~sabine/chunklink/ fragments in a phrase to its head. It also includes flags indicating whether the two mentions are in the same NP/PP/VP.</Paragraph> <Paragraph position="1"> * ET1DW1: combination of the entity type and the dependent word for M1 * H1DW1: combination of the head word and the dependent word for M1 * ET2DW2: combination of the entity type and the dependent word for M2 * H2DW2: combination of the head word and the dependent word for M2 * ET12SameNP: combination of ET12 and whether M1 and M2 included in the same NP * ET12SamePP: combination of ET12 and whether M1 and M2 exist in the same PP * ET12SameVP: combination of ET12 and whether M1 and M2 included in the same VP</Paragraph> </Section> <Section position="8" start_page="429" end_page="429" type="sub_section"> <SectionTitle> 4.7 Parse Tree </SectionTitle> <Paragraph position="0"> This category of features concerns about the information inherent only in the full parse tree.</Paragraph> <Paragraph position="1"> * PTP: path of phrase labels (removing duplicates) connecting M1 and M2 in the parse tree * PTPH: path of phrase labels (removing duplicates) connecting M1 and M2 in the parse tree augmented with the head word of the top phrase in the path.</Paragraph> </Section> <Section position="9" start_page="429" end_page="430" type="sub_section"> <SectionTitle> 4.8 Semantic Resources </SectionTitle> <Paragraph position="0"> Semantic information from various resources, such as WordNet, is used to classify important words into different semantic lists according to their indicating relationships.</Paragraph> <Paragraph position="1"> Country Name List This is to differentiate the relation subtype &quot;ROLE.Citizen-Of&quot;, which defines the relationship between a person and the country of the person's citizenship, from other subtypes, especially &quot;ROLE.Residence&quot;, where defines the relationship between a person and the location in which the person lives. Two features are defined to include this information: * ET1Country: the entity type of M1 when M2 is a country name * CountryET2: the entity type of M2 when M1 is a country name This is used to differentiate the six personal social relation subtypes in ACE: Parent, Grandparent, Spouse, Sibling, Other-Relative and Other-Personal. This trigger word list is first gathered from WordNet by checking whether a word has the semantic class &quot;person|...|relative&quot;. Then, all the trigger words are semi-automatically classified into different categories according to their related personal social relation subtypes. We also extend the list by collecting the trigger words from the head words of the mentions in the training data according to their indicating relationships. Two features are defined to include this information: * ET1SC2: combination of the entity type of M1 and the semantic class of M2 when M2 triggers a personal social subtype.</Paragraph> <Paragraph position="2"> * SC1ET2: combination of the entity type of M2 and the semantic class of M1 when the first mention triggers a personal social subtype.</Paragraph> </Section> </Section> class="xml-element"></Paper>