File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0208_concl.xml

Size: 1,436 bytes

Last Modified: 2025-10-06 13:55:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0208">
  <Title>Learning Domain-Specific Information Extraction Patterns from the Web</Title>
  <Section position="11" start_page="71" end_page="72" type="concl">
    <SectionTitle>
8 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> We have shown that it is possible to learn new extraction patterns for a domain-specific IE task by automatically identifying domain-specific web pages using seed patterns. Our approach produced a 5% increase in recall for extracting targets and a 3% increase in recall for extracting victims of terrorist events. Both increases in recall were at the cost of a small loss in precision.</Paragraph>
    <Paragraph position="1"> In future work, we plan to develop improved ranking methods and more sophisticated semantic affinity measures to further improve coverage and minimize precision loss. Another possible avenue for future work is to embed this approach in a bootstrapping mechanism so that the most reliable new IE patterns can be used to collect additional web pages, which can then be used to learn more IE patterns in an iterative fashion. Also, while most of this process is automated, some human intervention is required to create the search queries for the document collection process, and to generate the seed patterns. We plan to look into techniques to automate these manual tasks as well.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML