File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/w06-0208_relat.xml

Size: 2,188 bytes

Last Modified: 2025-10-06 14:15:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0208">
  <Title>Learning Domain-Specific Information Extraction Patterns from the Web</Title>
  <Section position="10" start_page="71" end_page="71" type="relat">
    <SectionTitle>
7 Related Work
</SectionTitle>
    <Paragraph position="0"> Unannotated texts have been used successfully for a variety of NLP tasks, including named entity recognition (Collins and Singer, 1999), subjectivity classification (Wiebe and Riloff, 2005), text classification (Nigam et al., 2000), and word sense disambiguation (Yarowsky, 1995). The Web has become a popular choice as a resource for large quantities of unannotated data. Many research ideas have exploited the Web in unsupervised or weakly supervised algorithms for natural language processing (e.g., Resnik (1999), Ravichandran and Hovy (2002), Keller and Lapata (2003)).</Paragraph>
    <Paragraph position="1"> The use of unannotated data to improve information extraction is not new. Unannotated texts have been used for weakly supervised training of IE systems (Riloff, 1996) and in bootstrapping methods that begin with seed words or patterns (Riloff and Jones, 1999; Yangarber et al., 2000). However, those previous systems rely on pre-existing domain-specific corpora. For example, EXDISCO (Yangarber et al., 2000) used Wall Street Journal articles for training. AutoSlog-TS (Riloff, 1996) and Meta-bootstrapping (Riloff and Jones, 1999) used the MUC-4 training texts. Meta-bootstrapping was also trained on web pages, but the &amp;quot;domain&amp;quot; was corporate relationships so domain-specific web pages were easily identified simply by gathering corporate web pages.</Paragraph>
    <Paragraph position="2"> The KNOWITALL system (Popescu et al., 2004) also uses unannotated web pages for information extraction. However, this work is quite different from ours because KNOWITALL focuses on extracting domain-independent relationships with the aim of extending an ontology. In contrast, our work focuses on using the Web to augment a domain-specific, event-oriented IE system with new, automatically generated domain-specific IE patterns acquired from the Web.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML