File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/x96-1009_abstr.xml
Size: 5,724 bytes
Last Modified: 2025-10-06 13:48:45
<?xml version="1.0" standalone="yes"?> <Paper uid="X96-1009"> <Title>TIPSTER II ACTIVITIES AT HNC</Title> <Section position="1" start_page="0" end_page="46" type="abstr"> <SectionTitle> TIPSTER II ACTIVITIES AT HNC </SectionTitle> <Paragraph position="0"> Marc R. llgen, David A. Rushall HNC Software, Inc., 5930 Cornerstone Court West, San Diego, CA 92121 USA email: mri@hnc.com, dar@hnc.com The arrival of the information age has brought with it new challenges for handling the vast quantities of electronically available information. The ARPA and Intelligence Community sponsored TIPSTER program has risen to this challenge. New technologies have been developed for attacking problems in information retrieval, information extraction, and multilingual information processing. As a premier developer of neural network based technologies, HNC has played an important role in successfully bringing innovative approaches to the problem of information retrieval, including multilingual information retrieval. The purpose of this brief paper is to summarize HNC's research accomplishments and to make recommendations for further study.</Paragraph> <Paragraph position="1"> The HNC approach to information retrieval is based on context vectors, which are unit vectors in a high dimensional vector space. The relative directions of these vectors encode the meaning and context of information that is to be retrieved. Neural network based training laws are used to adjust the components of these vectors in an iterative fashion. The basic context vector methodology has been implemented in a system called MatchPlus, which serves as a stand-alone information retrieval application as well as the technological core upon which additional context vector applications have been built.</Paragraph> <Paragraph position="2"> One problem with the MatchPlus system has been the large computation requirements of the learning law. In order to reduce the severity of this problem, HNC has developed a one step learning law that approximates the behavior of the original learning law at a fraction of the computational cost.</Paragraph> <Paragraph position="3"> Preliminary performance results with this learning law have been quite encouraging. More complete performance results can be generated when additional funding becomes available.</Paragraph> <Paragraph position="4"> HNC has also developed a preliminary prototype of an English-Spanish MatchPlus system.</Paragraph> <Paragraph position="5"> This system uses redundant hash table addressing to store context vectors for a multilingual vocabulary. This system can easily be extended to additional languages, such as Japanese, Russian, Chinese, etc.</Paragraph> <Paragraph position="6"> The multilingual MatchPlus approach is able to perform information retrieval in multiple languages without the necessity of specifying any grammatical information about the language, thereby greatly reducing system development time.</Paragraph> <Paragraph position="7"> HNC's context vector technology has also been extended to the problem of information routing and filtering, resulting in a COTS product known as CONVECTIS. CONVECTIS is currently being used by DataTimes (a large commercial information provider) as the core technology for routing information to customers based on customer specified &quot;interest profiles.&quot; The use of context vectors results in a natural method for specifying degree of relationship between incoming news feeds and customer interests. It is also a natural mechanism for detecting novel information themes, a fact that could be advantageously used by the Intelligence Community.</Paragraph> <Paragraph position="8"> HNC has also responded to the explosion of information available on the Internet by developing a system of autonomous retrieval agents based upon the MatchPlus technology. These retrieval agents can be scheduled by the user to access information at specified, possibly repeated times, thereby removing the requirement that the user actually be present during the retrieval operation. This system is currently being extended to handle multiple Internet and for-fee database protocols, resulting in a useful system for information retrieval from heterogeneous information sources.</Paragraph> <Paragraph position="9"> Since the context vector methodology requires only that a finite vocabulary of entities be defined for the domain of information, this technology has been extended to other media, images in particular. The ICARS system is being developed to solve the difficult problem of image retrieval and image content characterization. This system attaches context vectors to cluster centroids of feature vectors composed of Gabor features and color information.</Paragraph> <Paragraph position="10"> Preliminary tests have demonstrated that this system is able to retrieve relevant images without the need for complex image understanding algorithms.</Paragraph> <Paragraph position="11"> All these applications have demonstrated the tremendous versatility of the context vector technique. Context vectors allow information to be represented in a universal meaning space, opening up the possibility that text, images, and information in other media can be represented in a unified fashion. This technology has also been transitioned to the commercial sector, thereby satisfying a major objective of the funding agencies. In summary, HNC's context vector methodology has spawned a wide variety of solutions for government and commercial customers and holds considerable promise for serving as the technological foundation for future information retrieval and information management projects.</Paragraph> </Section> class="xml-element"></Paper>