File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1111_metho.xml
Size: 3,934 bytes
Last Modified: 2025-10-06 14:13:27
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1111"> <Title>Umass: Claire Cardie, Ellen Riloff, Joseph McCarthy,</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> The primary goal of our effort is the development of robust and portable language processing capabilities and information extraction applications. Our system is based on a sentence analysis technique called selective concept extraction. Having demonstrated the general viability of this technique in previous evaluations \[Lehnert, et al.</Paragraph> <Paragraph position="1"> 1992\], we are now concentrating on the practicality of our technology by creating trainable system components to replace hand-coded data and manually-engineered software.</Paragraph> <Paragraph position="2"> Our general strategy is to automate the construction of domain-specific dictionaries that can be completed with minimal amounts of human assistance. Our system relies on two major tools that support automated dictionary construction: (1) OTB, a trainable part-of-speech tagger, and (2) AutoSlog, a concept node generator that operates in conjunction with the CIRCUS sentence analyzer. Concept nodes are dictionary definitions for CIRCUS that encode lexically-indexed interactions between syntactic constituents and semantic case frames. OTB and AutoSlog both require minor technical adjustments and minimal assistance from a &quot;human in the loop&quot; in order to create a new domain-specific dictionary, but this can generally be accomplished by a single individual in the space of one week \[Riloff, 1993\].</Paragraph> <Paragraph position="3"> A third tool, 'ITS-MUC3, is responsible for the creation of a template generator that maps CIRCUS output into final template instantiations. &quot;ITS-MUC3 can be adjusted for a new domain in one day by a knowledgeable technician working with adequate domain documentation. This minimal manual engineering is required to specify objects and relationships. Once these adjustments are in place, TTS-MUC-3 uses CIRCUS and a development corpus of source texts and key templates to train classifiers for template generation. No further human intervention is required to create template generators.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT RESULTS </SectionTitle> <Paragraph position="0"> Our emphasis has been on fast system prototyping and rapid system development cycles. In preparing for the TIPSTER 18-month evaluation, we customized a complete information extraction system for the domain of English microelectronics (EME) in the space of four weeks working from scratch without the benefit of any domain experts.</Paragraph> <Paragraph position="1"> This time period included the development of a new facility for keyword recognition that had not been deemed necessary for any of our previous information extraction systems. If this facility had not been added, we could have cut our EME system development time down to two weeks.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR </SectionTitle> <Paragraph position="0"> Within the next six months we will incorporate semantic features into our system. We do not have semantic features in the current system because we would have had to acquire them through manual means, and we wanted to wait until we could acquire them through training. We now believe that we have identified a method for automated feature acquisition that should suffice for our purposes \[Cardie, 1993\].</Paragraph> <Paragraph position="1"> We are generally satisfied with the performance of OTB, AutoSlog, and CIRCUS and v,e believe the addition of semantic features will significantly boost our overall performance.</Paragraph> </Section> class="xml-element"></Paper>