File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-0208_evalu.xml
Size: 5,849 bytes
Last Modified: 2025-10-06 13:59:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0208"> <Title>Learning Domain-Specific Information Extraction Patterns from the Web</Title> <Section position="9" start_page="69" end_page="71" type="evalu"> <SectionTitle> 6 Experiments and Results </SectionTitle> <Paragraph position="0"> Our goal has been to use IE patterns learned from a fixed, domain-specific training set to automatically learn additional IE patterns from a large, patterns learned from the Web domain-independent text collection, such as the Web. Although many of the patterns learned from the CNN terrorism web pages look like good extractors, an open question was whether they would actually be useful for the original IE task. For example, some of the patterns learned from the CNN web pages have to do with beheadings (e.g., &quot;beheading of <np>&quot; and &quot;beheaded <np>&quot;), which are undeniably good victim extractors. But the MUC-4 corpus primarily concerns Latin American terrorism that does not involve beheading incidents. In general, the question is whether IE patterns learned from a large, diverse text collection can be valuable for a specific IE task above and beyond the patterns that were learned from the domain-specific training set, or whether the newly learned patterns will simply not be applicable. To answer this question, we evaluated the newly learned IE patterns on the MUC-4 test set.</Paragraph> <Paragraph position="1"> The MUC-4 data set is divided into 1300 development (DEV) texts, and four test sets of 100 texts each (TST1, TST2, TST3, and TST4).5 All of these texts have associated answer key templates.</Paragraph> <Paragraph position="2"> We used 1500 texts (DEV+TST1+TST2) as our training set, and 200 texts (TST3+TST4) as our test set.</Paragraph> <Paragraph position="3"> The IE process typically involves extracting information from individual sentences and then mapping that information into answer key templates, one template for each terrorist event described in the story. The process of template generation requires discourse processing to determine and MUC-4. The TST1 and TST2 texts were used as test sets for MUC-3 and then as development texts for MUC-4. The TST3 and TST4 texts were used as the test sets for MUC-4. template generation are not the focus of this paper. Our research aims to produce a larger set of extraction patterns so that more information will be extracted from the sentences, before discourse analysis would begin. Consequently, we evaluate the performance of our IE system at that stage: after extracting information from sentences, but before template generation takes place. This approach directly measures how well we are able to improve the coverage of our extraction patterns for the domain. null</Paragraph> <Section position="1" start_page="70" end_page="70" type="sub_section"> <SectionTitle> 6.1 Baseline Results on the MUC-4 IE Task </SectionTitle> <Paragraph position="0"> The AutoSlog-TS system described in Section 3 used the MUC-4 training set to learn 291 target and victim IE patterns. These patterns produced 64% recall with 43% precision on the targets, and 50% recall with 52% precision on the victims.6 These numbers are not directly comparable to the official MUC-4 scores, which evaluate template generation, but our recall is in the same ballpark. Our precision is lower, but this is to be expected because we do not perform discourse analysis.7 These 291 IE patterns represent our base-line IE system that was created from the MUC-4 training data.</Paragraph> </Section> <Section position="2" start_page="70" end_page="71" type="sub_section"> <SectionTitle> 6.2 Evaluating the Newly Learned Patterns </SectionTitle> <Paragraph position="0"> We used all 396 terrorism extraction patterns learned from the MUC-4 training set8 as seeds to identify relevant text regions in the CNN terrorism web pages. We then produced a ranked list of new terrorism IE patterns using a semantic affinity cut-off of 3.0. We selected the top N patterns from the ranked list, with N ranging from 50 to 300, and added these N patterns to the baseline system.</Paragraph> <Paragraph position="1"> Table 3 lists the recall, precision and F-measure for the increasingly larger pattern sets. For the tar6We used a head noun scoring scheme, where we scored an extraction as correct if its head noun matched the head noun in the answer key. This approach allows for different leading modifiers in an NP as long as the head noun is the same. For example, &quot;armed men&quot; will successfully match &quot;5 armed men&quot;. We also discarded pronouns (they were not scored at all) because our system does not perform coreference resolution.</Paragraph> <Paragraph position="2"> 7Among other things, discourse processing merges seemingly disparate extractions based on coreference resolution (e.g., &quot;the guerrillas&quot; may refer to the same people as &quot;the armed men&quot;) and applies task-specific constraints (e.g., the MUC-4 task definition has detailed rules about exactly what types of people are considered to be terrorists).</Paragraph> <Paragraph position="3"> get slot, the recall increases from 64.2% to 69.1% with a small drop in precision. The F-measure drops by about 1% because recall and precision are less balanced. But we gain more in recall (+5%) than we lose in precision (-3%). For the victim patterns, the recall increases from 51.7% to 54.2% with a similar small drop in precision. The overall drop in the F-measure in this case is negligible. These results show that our approach for learning IE patterns from a large, diverse text collection (the Web) can indeed improve coverage on a domain-specific IE task, with a small decrease in precision.</Paragraph> </Section> </Section> class="xml-element"></Paper>