File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/x96-1023_intro.xml

Size: 4,851 bytes

Last Modified: 2025-10-06 14:06:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="X96-1023">
  <Title>SPOT: TRW'S MULTI-LINGUAl, TEXT SEARCH TOOL</Title>
  <Section position="2" start_page="0" end_page="95" type="intro">
    <SectionTitle>
1.0. INTRODUCTION
1.1. Design Objectives
</SectionTitle>
    <Paragraph position="0"> TRW has developed a text search tool that allows users to enter a query in a number of languages and retrieve documents that match the query. This text search tool is called Spot. The following subsections describe the design objecfives and goals of Spot.</Paragraph>
    <Paragraph position="1"> 1.1.1. Support multiple search engines Our government users currently use a variety of tools for different purposes. For example, an archival database is only available through a legacy text search system that performs its searches very quickly, but lacks a great deal in search functionality. Other users use Paracel's Fast Data Finder search engine due to its powerful search capabilities and are only able to access its power through the FDF search tool user interface. null One of our design objectives was to handle multiple search engines within the same user interface tool. This provides users with a single user interface tool to learn, while providing them with a choice of search engines. Users might choose to perform a natural language query using the Excalibur/ConQuest search engine's concept query and switch to the Fast Data Finder to search Chinese text.</Paragraph>
    <Paragraph position="2"> We also aimed to provide the users with the full functionality of each of the search engines. This approach necessitates a more generic approach to many functions to ensure that the same user interface can be tailored to differing search engine technologies.</Paragraph>
    <Paragraph position="3"> 1.1.2. Support multi-lingual data Internationalized support is fairly easy to obtain commercially for a number of commonly-supported languages. The commercial products for internationalization are designed to support the marketing of a tool in a specific set of foreign countries, where the menus, buttons, error messages, and text all need to be displayed in the appropriate foreign language. For example, if a specific product needs to be marketed to the Japanese, it might be running under Sun's Japanese Language Environment, with JLE providing support for entering and displaying Japanese text. Multi-lingual support, however, is very difficult to obtain commercially. Our user community consists of native-English speakers, who want the menus and buttons to appear in English, but require support for dewing foreign-language documents in their native scripts, as well as entering foreign-language query terms in their native  scripts. For this functionality, internationalized support is inadequate.</Paragraph>
    <Paragraph position="4"> 1.1.3. Support query generation tools Users who are not native speakers of the foreign language in which they are submitting a query would like tools to assist in building queries. For example, we located a large Japanese-to-English thesaurus that was available in electronic form.</Paragraph>
    <Paragraph position="5"> It would be very useful for native-English speakers to look up relevant words in the Japanese thesaurus for assistance in building their queries. null In addition, words that are of a foreign origin are often transliterated in a number of different ways. For example, the name &amp;quot;Kadafi&amp;quot; is often spelled &amp;quot;Khadafi&amp;quot; or &amp;quot;Gadsm'. Query generation tools that allow users to enter &amp;quot;Kadafi&amp;quot; and find the other possible spellings are designed into Spot.</Paragraph>
    <Paragraph position="6"> 1.2. Maximize performance Spot was designed to be the user interface for a large archival database of hundreds of gigabytes of data. It needs to provide hundreds of users with access to this database.</Paragraph>
    <Paragraph position="7"> An archival database using the Fast Data Finder was implemented using Paracers Batch Search Server (BSS) product. Spot currently interfaces to this FDF archival database. Development is currently proceeding to interface Spot to an Excalibur/ConQuest archival database.</Paragraph>
    <Paragraph position="8"> Our objective in developing functionslity, including multi-lingual query generation tools and query functionality, has emphasized solutions that work very quickly, usually by exploiting the features of a specific search engine.</Paragraph>
    <Paragraph position="9"> Speed and throughput of searches through the FDF hardware search engine was measured using a commercial FDF-3 system.A single FDF-3 produced a search rate of around 3.5 MB/s, which could be obtained while searching 20 to 40 average queries simultaneously. A system of multiple FDFs can linearly expand the search rate.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML