File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/j93-2006_concl.xml
Size: 2,244 bytes
Last Modified: 2025-10-06 13:57:03
<?xml version="1.0" standalone="yes"?> <Paper uid="J93-2006"> <Title>BBN Systems and Technologies</Title> <Section position="6" start_page="380" end_page="380" type="concl"> <SectionTitle> 5. Conclusions </SectionTitle> <Paragraph position="0"> Our pilot experiments indicate that a hybrid approach to text processing including corpus-based probabilistic models to supplement knowledge-based techniques is both feasible and promising.</Paragraph> <Paragraph position="1"> In part-of-speech labeling, we have evaluated POST in the laboratory, evaluating its results against the work of people doing the same task. However, the real test of such a system is how well it functions as a component in a larger system. Can it make a parser work faster and more accurately? Can it help to extract certain kinds of phrases from unrestricted text? We are currently running these experiments by making POST a part of existing systems. It was run as a preprocessor to Grishman's Proteus system for the MUC-3 competition (Grishman and Sterling 1989). Preliminary results showed it sped up Proteus by a factor of two in one-best mode and by a factor of 33% with a threshold of T=2. It is integrated into a new message processing system (PLUM) at BBN.</Paragraph> <Paragraph position="2"> For reducing interpretation ambiguity, our context-free probability model, trained in supervised mode on only 81 sentences, was able to reduce the error rate for selecting the correct parse on independent test sets by a factor of 2-4. For the problem of processing new words in text, the probabilistic model reduced the error rate for picking the correct part of speech for such words from 91.5% to 15%. And once the possible parts of speech for a word are known (or hypothesized using the tri-tag model), the probabilistic language model proved useful in indicating which parses should be looked at for learning more complex lexical information about an unknown word. However, the most innovative aspect of our approach is the automatic induction of semantic knowledge from annotated examples. The use of probabilistic models offers the induction procedure a decision criterion for making generalizations from the corpus of examples.</Paragraph> </Section> class="xml-element"></Paper>