File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1045_intro.xml

Size: 2,746 bytes

Last Modified: 2025-10-06 14:02:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1045">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 355-362, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns</Title>
  <Section position="3" start_page="355" end_page="355" type="intro">
    <SectionTitle>
2 The Big Picture
</SectionTitle>
    <Paragraph position="0"> The goal of information extraction (IE) systems is to extract information about events, including the participants of the events. This task goes beyond Named Entity recognition (e.g., Bikel et al. (1997)) because it requires the recognition of role relationships. For example, an IE system that extracts information about corporate acquisitions must distinguish between the company that is doing the acquiring and the company that is being acquired. Similarly, an IE system that extracts information about terrorism must distinguish between the person who is the perpetrator and the person who is the victim.</Paragraph>
    <Paragraph position="1"> We hypothesized that IE techniques would be well-suited for source identification because an opinion statement can be viewed as a kind of speech event with the source as the agent.</Paragraph>
    <Paragraph position="2"> We investigate two very different learning-based methods from information extraction for the problem of opinion source identification: graphical models and extraction pattern learning. In particular, we consider Conditional Random Fields (Lafferty et al., 2001) and a variation of AutoSlog (Riloff, 1996a).</Paragraph>
    <Paragraph position="3"> CRFs have been used successfully for Named Entity recognition (e.g., McCallum and Li (2003), Sarawagi and Cohen (2004)), and AutoSlog has performed well on information extraction tasks in several domains (Riloff, 1996a). While CRFs treat source identification as a sequence tagging task, AutoSlog views the problem as a pattern-matching task, acquiring symbolic patterns that rely on both the syntax and lexical semantics of a sentence. We hypothesized that a combination of the two techniques would perform better than either one alone.</Paragraph>
    <Paragraph position="4"> Section 3 describes the CRF approach to identifying opinion sources and the features that the system uses. Section 4 then presents a new variation of AutoSlog, AutoSlog-SE, which generates IE patterns to extract sources. Section 5 describes the hybrid system: we encode the IE patterns as additional features in the CRF model. Finally, Section 6 presents our experimental results and error analysis.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML