File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/h94-1089_metho.xml

Size: 3,921 bytes

Last Modified: 2025-10-06 14:13:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1089">
  <Title>ROBUSTNESS, PORTABILITY AND SCALABILITY LANGUAGE SYSTEMS</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ROBUSTNESS, PORTABILITY AND SCALABILITY
LANGUAGE SYSTEMS
Ralph Weischedel
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
OF NATURAL
1. OBJECTIVE
</SectionTitle>
    <Paragraph position="0"> In the DoD, every unit, from the smallest to the largest, commtinicates through messages. Message are fundamental in command and control, in intelligence analysis, and in planning and replanning. Our objective is to create algorithms that will  1) robustly process open source text, identifying relevant messages, and updating a data base based on the relevant messages; 2) reduce the effort required in porting message processing software to a new domain from months to weeks; and 3) be scalable to broad domains with vocabularies of tens of thousands of words.</Paragraph>
    <Paragraph position="1"> 2. APPROACH  Our approach is to apply probabilistic language models and training over large corpora in all phases of natural language processing. This new approach will enable systems to adapt to both new task domains and linguistic expressions not seen before by semi-automatically acquiring I) a domain model, 2) facts required for semantic processing, 3) grammar rules, 4) information about new words, 5) probability models on frequency of occurrence, and 6) rules for mapping from representation to application structure.</Paragraph>
    <Paragraph position="2"> For instance, a statistical model of categories of words enables systems to predict the most likely category of a word never encountered by the system before and to focus on its most likely interpretation in context, rather than skipping the word or considering all possible interpretations. Markov modelling techniques are used for this problem.</Paragraph>
    <Paragraph position="3"> In an analogous way, statistical models of language are being developed and applied at the level of syntax (form), at the level of semantics (content), and at the contextual level (meaning and impact).</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. RECENT RESULTS
</SectionTitle>
    <Paragraph position="0"> * Consistently achieved high performance in Government-sponsored evaluations (MUC-3, MUC-4, MUC-5 and TIPSTER evaluations) of data extraction systems with significantly less human effort to port the PLUM system to each domain, compared with the effort reported in porting other high-performing systems.</Paragraph>
    <Paragraph position="1">  the error rate of a stochastic parser is a factor of two less than the same parser without a statistical language model.</Paragraph>
    <Paragraph position="2"> * Integrated a pattern matching component into our linguistically motivated framework to give semantics to fragmented parses and discontiguous constituents.</Paragraph>
    <Paragraph position="3"> * Created new demonstrations of the PLUM data extraction system in processing English texts about microelectronics and Japanese texts about microelectronics.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="446" type="metho">
    <SectionTitle>
4. PLANS FOR THE COMING YEAR
</SectionTitle>
    <Paragraph position="0"> Participate in MUC-6 evaluation at both the application level (extracting data from text) and the understanding level (parsing/semantic/discourse level).</Paragraph>
    <Paragraph position="1"> Create/revise probabilistic models for * word sense disambiguation, * semantic interpretation, and * co-reference resolution.</Paragraph>
    <Paragraph position="2"> Contribute to the definition of an evaluation methodology for glass box semantic evaluation (Semeval).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML