File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1109_metho.xml
Size: 10,347 bytes
Last Modified: 2025-10-06 14:08:49
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1109"> <Title>Discriminative Slot Detection Using Kernel Methods</Title> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> 3 A Discriminative Framework </SectionTitle> <Paragraph position="0"> The discriminative framework proposed here is called ARES (Automated Recognition of Event Slots). It makes no assumption about the text structure of events. Instead, kernels are used to represent syntactic information from various syntactic sources. The structure of ARES is shown in Fig 1. The preprocessing modules include a part-of-speech tagger, name tagger, sentence parser and GLARF parser, but are not limited to these.</Paragraph> <Paragraph position="1"> Other general tools can also be included, which are not shown in the diagram. The triangles in the diagram are kernels that encode the corresponding syntactic processing result. In the training phase, the target slot fillers are labeled in the text so that SVM slot detectors can be trained through the kernels to find fillers for the key slots of events. In the testing phase, the SVM classifier will predict the slot fillers from unlabeled text and a merging procedure will merge slots into events if necessary.</Paragraph> <Paragraph position="2"> The main kernel we propose to use is on GLARF (Meyers et al., 2001) dependency graphs.</Paragraph> <Paragraph position="3"> Fig 1. Structure of the discriminative model The idea is that an IE model should not commit itself to any syntactic level. The low level information, such as word collocations, may also give us important clues. Our experimentation will show that for the MUC-6 management succession domain, even bag-of-words or n-grams can give us helpful information about event occurrence.</Paragraph> <Section position="1" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.1 Syntactic Kernels </SectionTitle> <Paragraph position="0"> To make use of syntactic information from different levels, we can develop kernel functions or syntactic kernels to represent a certain level of syntactic structure. The possible syntactic kernels include * Sequence kernels: representing sequence level information, such as bag-of-words, ngrams, string kernel, etc.</Paragraph> <Paragraph position="1"> * Phrase kernel: representing information at an intermediate level, such as kernels based on multiword expressions, chunks or shallow parse trees.</Paragraph> <Paragraph position="2"> * Parsing kernel: representing detailed syntactic structure of a sentence, such as kernels based on parse trees or dependency graphs.</Paragraph> <Paragraph position="3"> These kernels can be used alone or combined with each other using the properties of kernels. They can also be combined with high-order kernels like polynomial or RBF kernels, either individually or on the resulting kernel.</Paragraph> <Paragraph position="4"> As the depth of analysis of the preprocessing increases, the accuracy of the result decreases. Combining the results of deeper processing with those of shallower processing (such as n-grams) can also give us a back-off ability to recover from errors in deep processing.</Paragraph> <Paragraph position="5"> In practice each kernel can be tested for the task as the sole input to an SVM to determine if this level of information is helpful or not. After figuring out all the useful kernels, we can try to combine them to make a comprehensive kernel as final input to the classifier. The way to combine them and the parameters in combination can be determined using validation data.</Paragraph> </Section> </Section> <Section position="6" start_page="3" end_page="3" type="metho"> <SectionTitle> 4 Introduction to GLARF GLARF (Grammatical and Logical Argument </SectionTitle> <Paragraph position="0"> Regularization Framework) [Meyers et al., 2001] is a hand-coded system that produces comprehensive word dependency graphs from Penn TreeBank-II (PTB-II) parse trees to facilitate applications like information extraction. GLARF is designed to enhance PTB-II parsing to produce more detailed information not provided by parsing, such as information about object, indirect object and appositive relations. GLARF can capture more regularization in text by transforming non-canonical (passive, filler-gap) constructions into their canonical forms (simple declarative clauses).</Paragraph> <Paragraph position="1"> This is very helpful for information extraction where training data is often sparse. It also represents all syntactic phenomena in uniform typed PRED-ARG structures, which is convenient for computational purposes. For a sentence, GLARF outputs depencency triples derived automatically from the GLARF typed feature structures [Meyers et al., 2001]. A directed dependency graph of the sentence can also be constructed from the depencency triples. The following is the output of GLARF for the sentence &quot;Tom Donilon, who also could get a senior job to surface relations, which is helpful for IE tasks. It can also generate output containing the base form of words so that different tenses of verbs can be regularized. Because of all these features, our main kernels are based on the GLARF dependency triples or dependency graphs.</Paragraph> </Section> <Section position="7" start_page="3" end_page="21" type="metho"> <SectionTitle> 5 Event and Slot Kernels </SectionTitle> <Paragraph position="0"> Here we will introduce the kernels used by ARES for event occurrence detection (EOD) and slot filler detection (SFD).</Paragraph> <Section position="1" start_page="3" end_page="21" type="sub_section"> <SectionTitle> 5.1 EOD Kernels </SectionTitle> <Paragraph position="0"> In Information Extraction, one interesting issue is event occurrence detection, which is determining whether a sentence contains an event occurrence or not. If this information is given, it would be much easier to find the relevant entities for an event from the current sentence or surrounding sentences.</Paragraph> <Paragraph position="1"> Traditional approaches do matching (for slot filling) on all sentences, even though most of them do not contain any event at all. Event occurrence detection is similar to sentence level information retrieval, so simple models like bag-of-words or n-grams could work well. We tried two kernels to do this, one is a sequence level n-gram kernel and the other is a GLARF-based kernel that matches syntactic details between sentences. In the following formulae, we will use an identity function ),( yxI that gives 1 when yx [?] and 0 otherwise, where x and y are strings or vectors of strings.</Paragraph> <Paragraph position="2"> 1. N-gram kernel ),( 21 SSNph that counts common n-grams between two sentences. Given two</Paragraph> <Paragraph position="4"> Kernels can be inclusive, in other words, the trigram kernel includes bigrams and unigrams. For the unigram kernel a stop list is used that removes words other than nouns, verbs, adjectives and adverbs.</Paragraph> <Paragraph position="5"> 2. Glarf kernel ),( 21 GGgph : this kernel is based on the GLARF dependency result. Given the triple outputs of two sentences produced by GLARF: },,{1 ><= iii aprG , 11 Ni [?][?] and },,{2 ><= jjj aprG , 21 Nj [?][?] , where ri, pi, ai correspond to the role label, predicate word and argument word respectively in GLARF output, it matches the two triples, their predicates and arguments respectively. So ),( 21 GGgph equals )),(),(),,,,,((</Paragraph> <Paragraph position="7"> In our experiments, a and b were set to 1.</Paragraph> </Section> <Section position="2" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 5.2 SFD Kernels </SectionTitle> <Paragraph position="0"> Slot filler detection (SFD) is the task of determining which named entities fill a slot in some event template. Two kernels were proposed for SFD: the first one matches local contexts of two target NEs, while the second one combines the first one with an n-gram EOD kernel.</Paragraph> <Paragraph position="1"> 1. ),(1 jiSFD GGph : This kernel was also defined on a GLARF dependency graph (DG), a directed graph constructed from its typed PRED-ARG outputs. The arcs labeled with roles go from predicate words to argument words. This kernel matches local context surrounding a name in a GLARF dependency graph. In preprocessing, all the names of the same type are translated into one symbol (a special word). The matching starts from two anchor nodes (NE nodes of the same type) in the two DG's and recursively goes from these nodes to their successors and predecessors, until the words associated with nodes do not match. In our experiment, the matching depth was set to 2.</Paragraph> <Paragraph position="2"> Each node n contains a predicate word w and relation pairs },{ >< ii ar , pi [?][?]1 representing its p arguments and the roles associated with them.</Paragraph> <Paragraph position="3"> A matching function ),( 21 nnC is defined as where Ei and Ej are the anchor nodes in the two DG's; ji nn [?] is true if the predicate words associated with them match. Functions Succ(n) and Pred(n) give the successor and predecessor node set of a node n. The reason for setting a depth limit is that it covers most of the local syntax of a node (before matching stops); another reason is that the cycles currently present in GLARF dependency graph prohibit unbounded recursive matching.</Paragraph> <Paragraph position="4"> 2. ),(2 jiSFD SSph : This kernel combines linearly the n-gram event kernel and the slot kernel above, in the hope that the general event occurrence information provided by EOD kernel can help the slot kernel to ignore NEs in sentences that do not contain any event occurrence.</Paragraph> <Paragraph position="5"> ),(),(),( 12 jiSFDjiNjiSFD GGSSSS bphaphph += , where ba, were set to be 1 in our experiments.</Paragraph> <Paragraph position="6"> The Glarf event kernel was not used, simply because it uses information from the same source as ),(1 jiSFD GGph . The n-gram kernel was chosen to be the trigram kernel, which gives us the best EOD performance among n-gram kernels.</Paragraph> <Paragraph position="7"> We also tried the dependency graph kernel proposed by (Collins et al., 2001), but it did not give us better result.</Paragraph> </Section> </Section> class="xml-element"></Paper>