File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0404_intro.xml
Size: 3,427 bytes
Last Modified: 2025-10-06 14:01:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0404"> <Title>Learning Subjective Nouns using Extraction Pattern Bootstrapping</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Many natural language processing applications could benefit from being able to distinguish between factual and subjective information. Subjective remarks come in a variety of forms, including opinions, rants, allegations, accusations, suspicions, and speculation. Ideally, information extraction systems should be able to distinguish between factual information (which should be extracted) and non-factual information (which should be discarded or labeled as uncertain). Question answering systems should distinguish between factual and speculative answers. Multi-perspective question answering aims to present multiple answers to the user based upon speculation or opinions derived from different sources. Multi-This work was supported in part by the National Science Foundation under grants IIS-0208798 and IRI-9704240.</Paragraph> <Paragraph position="1"> The data preparation was performed in support of the Northeast Regional Reseach Center (NRRC) which is sponsored by the Advanced Research and Development Activity (ARDA), a U.S. Government entity which sponsors and promotes research of import to the Intelligence Community which includes but is not limited to the CIA, DIA, NSA, NIMA, and NRO.</Paragraph> <Paragraph position="2"> document summarization systems need to summarize different opinions and perspectives. Spam filtering systems must recognize rants and emotional tirades, among other things. In general, nearly any system that seeks to identify information could benefit from being able to separate factual and subjective information.</Paragraph> <Paragraph position="3"> Subjective language has been previously studied in fields such as linguistics, literary theory, psychology, and content analysis. Some manually-developed knowledge resources exist, but there is no comprehensive dictionary of subjective language.</Paragraph> <Paragraph position="4"> Meta-Bootstrapping (Riloff and Jones, 1999) and Basilisk (Thelen and Riloff, 2002) are bootstrapping algorithms that use automatically generated extraction patterns to identify words belonging to a semantic category. We hypothesized that extraction patterns could also identify subjective words. For example, the pattern &quot;expressed <direct object>&quot; often extracts subjective nouns, such as &quot;concern&quot;, &quot;hope&quot;, and &quot;support&quot;. Furthermore, these bootstrapping algorithms require only a handful of seed words and unannotated texts for training; no annotated data is needed at all.</Paragraph> <Paragraph position="5"> In this paper, we use the Meta-Bootstrapping and Basilisk algorithms to learn lists of subjective nouns from a large collection of unannotated texts. Then we train a subjectivity classifier on a small set of annotated data, using the subjective nouns as features along with some other previously identified subjectivity features. Our experimental results show that the subjectivity classifier performs well (77% recall with 81% precision) and that the learned nouns improve upon previous state-of-the-art subjectivity results (Wiebe et al., 1999).</Paragraph> </Section> class="xml-element"></Paper>