File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/h01-1040_intro.xml
Size: 2,943 bytes
Last Modified: 2025-10-06 14:01:06
<?xml version="1.0" standalone="yes"?> <Paper uid="H01-1040"> <Title>Intelligent Access to Text: Integrating Information Extraction Technology into Text Browsers</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> Information extraction (IE) technology, as promoted and defined by the DARPA Message Understanding Conferences [4, 5] and the current ACE component of TIDES [1], has resulted in impressive new abilities to extract structured information from texts, and complements more traditional information retrieval (IR) technology which retrieves documents or passages of relevance from text collections and leaves information seekers to browse the retrieved sub-collection (e.g. [2]). However, while IR technology has been readily incorporated into end-user applications (e.g. web search engines), IE technology has not yet been as successfully deployed in end-user systems as its proponents had hoped. There are several reasons for this, including: 1. Porting cost. Moving IE systems to new domains requires considerable expenditure of time and expertise, either to create/modify domain-specific resources and rule bases, or to annotate texts for supervised machine learning approaches.</Paragraph> <Paragraph position="1"> 2. Sensitivity to inaccuracies in extracted data. IE holds out the promise of being able to construct structured databases from text sources automatically, but extraction results are by no means perfect. Thus, the technology is only appropriate .</Paragraph> <Paragraph position="2"> for applications where some error is tolerable and readily detectable by end users.</Paragraph> <Paragraph position="3"> 3. Complexity of integration into end-user systems. IE sys- null tems produce results (named entity tagged texts, filled templates) which must be incorporated into larger, more sophisticated application systems if end users are to gain benefit from them.</Paragraph> <Paragraph position="4"> In this paper we present the approach taken in the TRESTLE project (Text Retrieval Extraction and Summarisation Technologies for Large Enterprises) which addresses the second and third of these problems; and also preliminary results from the user testing evaluation of the TRESTLE interface. The goal of the TRESTLE project is to develop an advanced text access facility to support information workers at GlaxoSmithKline (GSK), a large pharmaceutical corporation. Specifically, the project aims to provide enhanced access to Scrip1, the largest circulation pharmaceutical industry newsletter, in order to increase the effectiveness of employees in their &quot;industry watch&quot; function, which involves both broad current awareness and tracking of people, companies and products, particularly the progress of new drugs through the clinical trial and regulatory approval process.</Paragraph> </Section> class="xml-element"></Paper>