File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/96/x96-1028_concl.xml
Size: 2,956 bytes
Last Modified: 2025-10-06 13:57:45
<?xml version="1.0" standalone="yes"?> <Paper uid="X96-1028"> <Title>Probabilistic Language Understanding System&quot;,</Title> <Section position="12" start_page="136" end_page="137" type="concl"> <SectionTitle> 6. CONCLUSIONS </SectionTitle> <Paragraph position="0"> This paper briefly described a diverse collection of research activities at BBN on information extraction.</Paragraph> <Paragraph position="1"> We have concluded the following: Our information extraction engines for the MUC-6 Named Entity task and Template Element task employ no domain-specific information. We believe that development of other broadly applicable information extraction functionality such as NE and TE will be a win, maximizing the value of defining reusable knowledge bases for information extraction.</Paragraph> <Paragraph position="2"> * We developed a full parser of English, using a statistically learned decision procedure; SPATTER has achieved the highest scores yet report on parsing English text (Magerman, 1995). The fact that its recall and precision are both in the high 80s represents not just a quantitative improvement in parser performance, but also a qualitative improvement.</Paragraph> <Paragraph position="3"> * The NLU Shell provides a way for non-programmers to build and maintain information extraction systems based on PLUM. The use of a GUI and a database in place of files of source code and data represents a fundamental advance in making natural language technology widely available. While the NLU Shell users need detailed familiarity with extracting formatted data from natural language, they do not need to be programmers.</Paragraph> <Paragraph position="4"> * Compared to PLUM's previous performance in MUC-3, -4, and -5, our progress in MUC-6 was much more rapid and our official score was higher than in any previous template fill task.</Paragraph> <Paragraph position="5"> Furthermore, PLUM's performance was higher than in any of the previous full template MUC tasks. * The template merging experiment provided a substantial range in recall versus precision, i.e., in undergeneration versus overgeneration.</Paragraph> <Paragraph position="6"> Nevertheless, we did not achieve a breakthrough in overall F-score, i.e., in ERR.</Paragraph> <Paragraph position="7"> * Our preliminary results in learning the Named Entity Extraction task in English and Spanish are quite encouraging, since they are better than any previously reported scores for a learned system and since they are approaching the scores of the state-of-the-art for manually built, rule based systems. * A preliminary experiment in information extraction from speech has shown that there are very significant challenges for TIPSTER text extraction technology, including the current 20-30% word error rate of transcription systems, the lack of punctuation within sentences, the lack of capitalization, and the error rate on names.</Paragraph> </Section> class="xml-element"></Paper>