File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/m98-1009_concl.xml
Size: 2,159 bytes
Last Modified: 2025-10-06 13:58:03
<?xml version="1.0" standalone="yes"?> <Paper uid="M98-1009"> <Title>ALGORITHMS THAT LEARN TO EXTRACT INFORMATION BBN: DESCRIPTION OF THE SIFT SYSTEM AS USED FOR MUC-7</Title> <Section position="7" start_page="15" end_page="16" type="concl"> <SectionTitle> CONCLUSIONS </SectionTitle> <Paragraph position="0"> BBN's results on the three tasks are summarized in the following table: These results, close to those of the best system, demonstrate the power of trained systems for information extraction.</Paragraph> <Paragraph position="1"> The NE result demonstrates the robustness of IdentiFinder(TM), the learning algorithm used for NE, to an unknown but similar domain. Further tests also showed its robustness to all upper case input, and input with no punctuation. Our future plans for IdentiFinder(TM) include * evaluation in the broadcast news domain, which requires speech input in a much broader domain, * applying IdentiFinder(TM) to unsegmented languages, and * working on performance improvements and improvements in the training process. The SIFT model, used for TE and TR, employs the Penn Treebank for syntactic information, and thus requires for its training data only the semantic annotation of entities, descriptors, and relationships. Its sentence-level model determines parts of speech, parses, finds names, and identifies semantic relationships in a single, integrated process, with a separate merging model then used to connect information between sentences.</Paragraph> <Paragraph position="2"> Time was a limiting factor in SIFT's performance, since the decision to field an integrated, fully-trained model was made only in late January, so that SIFT first existed in end-to-end form only as of March 11. That left little time for experiments, or for addressing all issues, such as the handling of nationalities and unnamed TE entities. Given the early stage of development of the SIFT system, we believe that significant performance improvements are still possible. We are also interested in measuring performance as a function of training set size, and in applying SIFT to the broadcast news domain.</Paragraph> </Section> class="xml-element"></Paper>