File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1122_metho.xml

Size: 4,704 bytes

Last Modified: 2025-10-06 14:13:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1122">
  <Title>NLP AND TEXT UNIVERSITY OF ANALYSIS AT THE MASSACHUSETTS</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
NLP AND TEXT UNIVERSITY OF ANALYSIS AT THE
MASSACHUSETTS
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROJECT GOALS
</SectionTitle>
    <Paragraph position="0"> Our group is investigating a variety of techniques centered around the use of text corpora to support natural language processing applications. We are interested in information extraction from text, text classification, and knowledge acquisition from text corpora. Our goal is to develop technologies that can be readily ported across domains and scaled up with a minimal amount of manual engineering.</Paragraph>
    <Paragraph position="1"> In particular, we are experimenting with various kinds of statistical profiles and case based reasoning systems in order to facilitate:  Although it is doubtful that all manual knowledge engineering can be eliminated from the development cycle of practical NLP systems, we believe that minimal amounts of manual engineering can be highly leveraged when used in conjunction with a suitable text corpus.</Paragraph>
    <Paragraph position="2"> Given a specific text processing application and a text corpus that is representative of the target texts, we are experimenting with different aspects of system development that can be fully or partially automated.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RECENT RESULTS
</SectionTitle>
    <Paragraph position="0"> Using the UMass/MUC-3 system implementation as a starting point, we have been looking at the problem of text classification as it pertains to domain relevancy. Based on a fully-automated semantic analysis of the MUC-3 texts, we have developed statistical profiles to distinguish texts that describe legitimate terrorist events from texts that are &amp;quot;near misses&amp;quot; with respect to the domain definition. Using these profiles, we can discriminate new texts with relatively high degrees of recall and precision (as high as 97% recall with 93% precision on one test run of 100 texts).</Paragraph>
    <Paragraph position="1"> We have also been looking at case-based reasoning (CBR) techniques and evaluating the utility of CBR in conjunction with the MUC-3 text corpus. Drawing once again from the UMass/MUC-3 system, we have run additional experiments on our CBR-based consolidation module in order to better understand its capabilities. In one such experiment, we determined that the CBR module is capable of producing recall and precision scores for incident types that exceed the recorded performance levels of all the MUC-3 systems (85% recall with 91% precision). Our own UMass/MUC-3 system posted 77% recall with 81% precision on incident types. Unfortunately, comparable performance improvements have not been obtained for any other MUC-3 template slots.</Paragraph>
    <Paragraph position="2"> In a separate CBR effort, we have designed a new CBR module that locates referents for the relative pronoun &amp;quot;who&amp;quot; in the MUC-3 texts. Operating with 75-90% hit rates, this system outperforms our original hand-coded heuristics.</Paragraph>
    <Paragraph position="3"> Interestingly, it tends to make most of its mistakes on convoluted sentences that are confusing to human readers.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="489" type="metho">
    <SectionTitle>
PLANS FOR THE COMING YEAR
</SectionTitle>
    <Paragraph position="0"> We expect to continue our ongoing investigations in each of the three areas mentioned above. We want to further refine our text classification profiles and investigate the integration of these capabilities back into a complete information extraction system (such as the UMass/MUC-4 system). We hope to experiment with variations on our UMass/MUC-3 consolidation component to see if aspects of that capability can assume a more prominent role in our overall system design. We will also continue our investigations with CBR-based discourse analysis and see if we can generalize this technique from relative pronoun resolution to other problems associated with scoping and structural ambiguities.</Paragraph>
    <Paragraph position="1"> More generally, we hope to gain a greater understanding of selective concept extraction as a sentence analysis technique, both in terms of its portability across domains, and its inherent limitations within specific text processing applications.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML