File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1036_intro.xml
Size: 2,372 bytes
Last Modified: 2025-10-06 14:06:16
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1036"> <Title>An Open Distributed Architecture for Reuse and Integration of Heterogeneous NLP Components</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The shift from Computational Linguistics to Language Engineering 1 is indicative of new trends in NLP. We believe that it is not simply a new fashion but that it is indicative of the growing maturation of the field, as also suggested by an emphasis on building large-scale systems, away from toy research systems. There is also an increasing awareness that real-size systems are not mere scaled-up toy systems but that they present an altogether qualitatively different set of problems that require new tools and new ideas, as clearly exemplified by recent projects and programs such as Pangloss (Frederking et al. 94), Tipster (ARPA 94), and Verbmobil (GSrz et al. 96).</Paragraph> <Paragraph position="1"> Natural language engineering addresses some traditional issues in software engineering: robustness, testing and evaluation, reuse, and development of large-scale applications (see e.g., (Sommerville 96) for an overview). These issues have been and are the topic of a number of NLP projects and programs: TSNLP, DECIDE, Tipster, MUC, TREC, Multext, Multilex, Genelex, Eagles, etc. This paper reviews two domains of problems in natural language 1To use the name of two well-known NLP journals.</Paragraph> <Paragraph position="2"> engineering: reuse and integration in the context of software architectures for Natural Language Processing. The emphasis is put on reuse of NLP software, components and their integration in order to build large-scale applications. Also relevant to this presentation are topics such as integration of heterogeneous components for building hybrid systems or for integrating speech and other &quot;higher-level&quot; NLP components (section 2).</Paragraph> <Paragraph position="3"> Section 3 presents the Corelli Document Processing Architecture, a new software architecture for NLP which is designed to support the development of a variety of large- scale NLP applications: Information Retrieval, Corpus Processing, Multilingual MT, and integration of Speech with other NLP components. null</Paragraph> </Section> class="xml-element"></Paper>